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Realism trumps hope at the EPA 


Scott Pruitt’s resignation from the US Environmental Protection Agency was long overdue. 
But the threat to science posed by Trump and his allies remains. 


the US Environmental Protection Agency (EPA) is that it took 

so long. By all accounts, he was unfit to lead one of the world’s 
top science-based regulatory agencies. It wasn’t just that the former 
Oklahoma attorney-general had a well-documented history of con- 
sorting with industry to fight the agency. It was his contradictory 
behaviour — exemplified by the installation of an expensive sound- 
proof phone booth in his office — which put a premium on secrecy 
even as he made grand proclamations about transparency. But by far 
the worst was Pruitt’s utter disregard for both the science and the sci- 
entists under his charge — as we highlight in a News Feature this week 
(see page 316). 

Ultimately, Pruitt seems to have been felled by the impunity he 
exhibited over the course of nearly a year and a half at the agency. 
Lawmakers on both sides of the political aisle raised alarms over his 
lavish spending and a series of alleged ethical transgressions that are 
more typical of crony governments elsewhere in the world. 

His departure is welcome, but it would be naive to think that 
the prospects for the agency and its scientists are any brighter. His 
agenda — the same one as US President Donald Trump — remains 
intact. Trump made this all too clear in a pair of tweets announcing 
Pruitt’s resignation on 5 July. The president declared that Pruitt had 
done an “outstanding job’, and said that the new acting administrator, 
Andrew Wheeler, a former coal lobbyist, “will continue on with our 
great and lasting EPA agenda’. 

Trump has yet to formally nominate Wheeler as the next EPA 
administrator, but the move would be in keeping with the president's 
approach. A lawyer by training, Wheeler spent 4 years at the agency 
in the early 1990s, under former presidents George H. W. Bush and 
Bill Clinton. He later served as a top aide on the Senate Environment 
and Public Works Committee under Oklahoma Republican James 
Inhofe, a leading climate sceptic in Congress. Wheeler knows how 
the agency works, and is comfortable on Capitol Hill. In the words of 
one EPA scientist, who asked for anonymity, Wheeler is “a supremely 
effective and precise Washington operative”. This, of course, is both 
praise and a warning. 

Wheeler will probably restore some kind of normal order at 
the agency, which means following conventional procedures, re- 
establishing fractured relations with staff scientists and avoiding the 
kind of embarrassing headlines that plagued Pruitt’s tenure. Already, 
in his first week as acting administrator, Wheeler has delivered an 
all-hands address at the agency’s research campus in Durham, North 
Carolina. That stands in stark contrast to Pruitt, who quietly dipped in 
and out of the campus a few weeks ago, before he stepped down, with 
no word to the full staff. Not once during his tenure did Pruitt make 
time to address the EPA’s Office of Research and Development, which 
houses the bulk of the agency’s scientists — hardly the way to either 
inspire loyalty or demonstrate he was on top of his brief. 

Under a new boss, EPA researchers might even be able to present 


r | Vhe most remarkable thing about Scott Pruitt’s resignation from 


their findings once again to the leadership, as the administration 
deliberates over environmental and public-health regulations. Such 
scientific consultations — fundamental to the establishment of sci- 
ence-based policies that can withstand the inevitable legal challenges 
that follow — were often eschewed under Pruitt, who showed little 
regard for the importance of evidence. 

Scientists should be wary about celebrating Pruitt’s exit. They should 
be careful what they wish for. The problem is that if Wheeler — or 
whoever takes on the job full-time — is more effective than Pruitt (and 
they could hardly be otherwise), then Trump’s problematic policies 
are likely to have more impact, too. And that could spell more trouble 
for public health and the environment, not just in the United States 
but around the globe — at a time when a sound and evidence-based 
approach to both has never been so critical. 

A fundamental goal of many of Trump’s efforts and policies is to 
relieve US industry of what he regards as regulatory burdens. Repub- 
lican rhetoric has been trending in that direction for years, par- 

ticularly when it comes to regulations that 


“What’sneeded —_ combat climate change but that industries 
arepoliciesthat _ find expensive or cumbersome. Indeed, the 
allow agency vast majority of conservative lawmakers 
researchers have either actively disavowed mainstream 
to follow the science or turned a blind eye to the pressing 
science wherever _ need to address one of the biggest challenges 


of the twenty-first century. It’s a disgrace 
that will go down in the history books, but 
Trump and his team have pushed things to a new extreme. Rather than 
simply rolling back regulations, Pruitt sought to straitjacket the EPA 
and undermine the role of both science and scientists in regulatory 
policy. For example, he banned scientists with EPA grants from serv- 
ing on the agency’s advisory boards, and proposed a rule that would 
prevent the agency from citing public-health research for which the 
underlying data are not publicly available — including high-quality 
epidemiological studies that help to provide the technical basis for cur- 
rent air-quality regulations, but whose data must be partially hidden 
to protect patients’ identities. 

Republicans on Capitol Hill have provided a glimmer of hope by 
repeatedly rejecting Trump’s proposals to slash the EPA budget as 
well as funding for climate and energy research at other agencies, 
but money alone wont solve the problems that EPA scientists face 
today. What’s needed are policies that give deference to evidence and 
that allow agency researchers to follow the science wherever it might 
lead — even if politicians don't like the implications. 

Environmentalism needn't be a partisan issue. It was one of 
the Republicans’ own, Richard Nixon, who oversaw the creation 
of the EPA, and the last major upgrade to the Clean Air Act came 
under the first Bush administration. Wheeler might be more success- 
ful in implementing Trump’s policies, but that’s dangerous, as Trump 
is completely out of touch with scientific reality. m 


it might lead.” 
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Lost and found 


European funders are right to consider the 
career prospects of young scientists. 


phrase is attributed to US writer Gertrude Stein, who heard it as a 

casual insult aimed by a garage boss at a young French mechanic 
who was working — too slowly — on Stein’s car. The term is now 
generally used to describe a group of people who are lost to society. 
So when European officials spoke at a conference session last week 
called “The lost generation of European scientists, for many partici- 
pants the name would have conjured up thoughts of an exodus of 
talented early-career researchers, who are fed up with the insecurity 
of short-term jobs and with dwindling opportunities in academia. 
And so it should have: in many disciplines, that issue is real, growing 
and serious. Young and early-career researchers need the problem to 
be taken seriously — and so does the rest of the scientific community. 
Figures are difficult to come by, but less than one-fifth of US postdocs 
secure a tenured research position, and the situation is even more 
competitive in Europe. 

The ‘lost generation’ tag has another, more subtle meaning. 
Popularized by US writer Ernest Hemingway, it was used to 
describe the age group — Hemingway included — that had been 
left disoriented and confused by growing up amid the horrors and 
chaos of the First World War. Lost, not missing. The distinction is 
important. Careers outside academia are just as valuable and senior 
scientists must acknowledge this. Nevertheless, young researchers 
are too often led to believe that a non-academic career is inferior, so 
individual scientists who find they need to look elsewhere often feel 
let down, deceived and cynical. 

Last week’s event, held at the EuroScience Open Forum in Toulouse, 


E is a century since the first génération perdue came of age. The 


France, covered all of that ground. The session was well attended and 
was frank about the scale of the problem and the difficulty of finding 
solutions. This might indicate that European funders and policymakers 
are catching up with the United States, where the crisis of confidence 
and opportunity among young scientists — especially in biomedi- 
cine — has been widely debated for at least a decade. That would be 
good news. The bad news is that Europe’s fragmented, variable national 
research bodies and strong university autonomy make it much easier 

to acknowledge the problem than to change the systems that cause it. 
At the meeting, European Research Council president Jean-Pierre 
Bourguignon hinted at an obvious fix: increase funding for scientific 
research and create more permanent academic jobs. But that's a big 
ask, and one that would take time. A more- 


“Universities immediate solution calls for more-specific 
should track and and targeted changes. One is the creation 
provide data of more full-time staff scientist positions, 
on how many although such posts (with benefits such as 
academic jobs pensions) raise institution costs. 


As we have argued previously (Nature 
550, 429; 2017), there is a pressing need for 
greater transparency about the likelihood of PhD students and post- 
docs following an academic career to the higher levels. A suggestion 
made at last week’s session — and one that Nature endorses — is that 
universities and other institutions should track and provide data on 
how many academic jobs are available at each level, and list the destina- 
tion of every scientist who moves on. The US National Academies has 
made an attempt at doing this for postdocs, and the European Science 
Foundation has tried to track the fate of Europe’s PhD holders. Both 
are good models to follow. 

Better information won't solve all the problems of all the ‘lost’ 
researchers, but it will at least provide them with a map as they decide 
on their next move. Those who supervise PhD students and postdocs 
must show them such a map, and take responsibility for preparing 
them for non-academic careers. What might look like a loss for 
academia can still be a great gain for society. m 


are available.” 


Gone rogue 


Officials and scientists need help to track down 
source of a worrying rise in CFC emissions. 


environmental stewardship, officials who safeguard Earth's 

ozone layer are facing an unexpected crisis: how to iden- 
tify and cut offa rogue new source of ozone-destroying chemicals 
(S. A. Montzka et al. Nature 557, 413-417; 2018). If not stopped, 
the emissions of CFC-11 might delay by several decades the heal- 
ing of ozone holes that appear at high latitudes early each spring. As 
expected, the issue featured heavily at last week’s meeting in Vienna 
of the Open-Ended Working Group (OEWG) of the Montreal Pro- 
tocol, which protects the ozone layer. Since the protocol’s launch in 
1987, countries have curbed the use of ozone-depleting chemicals in 
refrigeration and other industrial processes. 

Ahead of the meeting, media reports and an analysis by the London- 
based Environmental Investigation Agency — a non-governmental 
organization with observer status in the Montreal Protocol — used 
interviews with company executives and information contained in 
advertisements to suggest that foam-manufacturing companies in 
rural China are to blame. Chinese delegates in Vienna made it clear 
that they take the matter seriously and, by all accounts, the issue has 
gone up to the level of Chinese President Xi Jinping. But they remained 
extremely reluctant to concede any serious wrongdoings on the part of 
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Chinese companies, or government negligence in their oversight. This 
is understandable given that there is not yet definite evidence concern- 
ing the sources, quantity, duration or nature of the rogue emissions. 

The suspicion that Chinese factories are the main — perhaps the 
sole — source of the damaging CFC-11 chemicals cannot be dis- 
missed. But for now, increased vigilance must apply to the whole of 
South and East Asia. To pinpoint the source of the rogue emissions 
precisely, members of the Montreal Protocol's scientific assessment 
panel are working to analyse the most recent data from the region's 
atmospheric monitoring stations, including those of South Korea and 
Japan. Governments must make available, without delay, any data 
required for further analysis, and should also provide any other intel- 
ligence, such as that from commercial register entries, advertisements 
or customs, that could help to pin down any source of the emissions. 
The issue is a test of the strength and muscle of the Montreal Protocol 
regime, which must mobilize all the pieces — science, monitoring, 
verification and, possibly, sanctions. Already, four years have elapsed 
since scientists observed and reported the worrying CFC spike. 
What’s needed now, besides enduring vigilance, is a rapid political 
and institutional effort. 

There is no doubt that China has, over the past few years, stepped 
up its environmental efforts, including those tackling air pollution 
and greenhouse-gas emissions. If Chinese sources of CFC-11 produc- 
tion are confirmed, the government should engage its full enforce- 
ment capacity to stop it immediately. Ironically, the current crisis is an 
opportunity for China to demonstrate its emerging leadership in the 
enforcement of global environmental policies. If the Montreal Proto- 
col survives this test, the most beneficial environmental pact the world 
has ever undertaken will surely emerge stronger than ever. m 
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India tests crew pod 


India’s space agency has 
successfully tested a crew 
escape system, which puts 
the country a step closer 

to achieving human space 
flight. A simulated crew 
module weighing 12.6 tonnes, 
containing the unoccupied 
escape system, blasted off 
from the Satish Dhawan 
Space Centre in Sriharikota 
on 5 July. The module was 
propelled by seven motors 
that are designed to move 
crew members away froma 
threat without exceeding safe 
g forces. It reached an altitude 
of nearly 2.7 kilometres before 
separating from the motors, 
deploying its parachutes and 
drifting back to Earth. The 
module was recovered in the 
nearby Bay of Bengal. 


Sterile mosquitoes 


A mosquito-control trial 
co-sponsored by the Google 
life-sciences spin-off Verily 
of South San Francisco, 
California, has passed one of 
its first big tests. On 10 July, 
Australia’s funding agency, 
the Commonwealth Scientific 
and Industrial Research 
Organisation, which is 
another co-sponsor, reported 
a greater than 80% reduction 
in dengue-transmitting 
Aedes aegypti mosquitoes 
during a five-month trial 

in three Australian towns 

in North Queensland. The 
trial involved releasing male 
mosquitoes that had been 
rendered sterile by infection 
with Wolbachia bacteria. 
Sterile males, released into 
homes, compete for mates 
with other males, resulting 
in an eventual population 
reduction. 


Cosmic maps 

A European space-telescope 
project that tracked the faint 
afterglow of the Big Bang 
released its final and most 


South Africa unveils huge radio telescope 


This radio-wave image of the central regions 

of the Milky Way was unveiled at the opening 
ceremony of South Africa's MeerKAT radio 
telescope on 13 July. The 4.4-billion-rand 
(US$330-million) observatory has 64 dishes, 
each 13.5 metres in diameter. It will observe 
transient astrophysical events — including fast 
radio bursts — and conduct surveys such as 
mapping hydrogen abundance across cosmic 
history. MeerKAT will eventually be part of the 


precise maps of the early 
Universe on 17 July (see 
go.nature.com/2jt0sbi). The 
European Space Agency's 
Planck telescope launched in 
2009 and finished surveying 
the cosmic microwave 
background in 2013, but the 
science team has revamped 
its data-analysis techniques 

to improve the precision of 

its measurements of crucial 
features of the Universe. 

The Planck data continue 

to predict that the current 
Universe should expand about 
9% slower than observations 
of relatively nearby galaxies 
suggest. The telescope did 

not detect the signature of 
gravitational waves from the 
early stages of the Big Bang, 
which would point to an early, 
exponential expansion known 
as inflation, although future, 
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more-sensitive experiments 
could yet find such a 
signature. 


EVENTS 


Herbicide lawsuits 
AUS federal judge ruled last 
week that lawsuits against 
Monsanto — an agriculture 
corporation based in St Louis, 
Missouri — can proceed. 

One such suit alleges that 

the company’s herbicide, a 
glyphosate product marketed 
as Roundup, was a substantial 
factor in development of 

the cancer non-Hodgkin's 
lymphoma ina 42-year-old 
groundskeeper. Brent Wisner, 
the plaintiff’s lawyer, pointed 
to evidence linking glyphosate 
exposure among agricultural 
workers to cancer. Arguments 
for this lawsuit began on 9 July, 
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Square Kilometre Array (SKA), a facility under 
construction in South Africa and Australia, 
and it is already boosting the country’s 
scientific community and drawing researchers 
from around the world. “Back in the day, our 
astronomers went abroad to do astronomy. 
Now were the attraction,’ says Justin Jonas, 
chief technologist at the Cape Town-based 
South African Radio Astronomy Observatory. 
See go.nature.com/2jtg5sd for more. 


and it is the first of many 
similar suits filed in the United 
States to go to trial. “The 
scientific evidence clearly 
shows that glyphosate was not 
the cause” of cancer, wrote 
Monsanto vice-president Scott 
Partridge in a statement to 
Nature. 


Indian universities 
The Indian government 
named its first six ‘institutes 
of eminence on 9 July, with 
the aim of elevating Indian 
universities in global rankings. 
Three public and three 
private universities will each 
receive US$145.7 million 
over five years and be given 
greater autonomy to hire 
foreign faculty members and 
students. The government 
was originally due to name 

20 institutions of eminence in 
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April. Some academics told 
Nature the delay was because 
of disagreements on how to 
select institutions. Others have 
also criticized the inclusion 

of the Jio Institute, a private 
institution that has yet to be set 
up. “Any institution needs to be 
in existence and have shown its 
worth before being considered 
for institutions of eminence 
status,” says Subhash Lakhotia, 
a cytogenetics researcher at 
Banaras Hindu University in 
Varanasi. 


Terrapin trouble 
The US attorney’s office 
indicted David Sommers, 

a Pennsylvania resident, on 

10 July for allegedly poaching 
thousands of protected 
terrapins and their eggs from 
New Jersey salt marshes, and 
then selling them from his 
home. An investigation by the 
US Fish and Wildlife Service 
found that Sommers sold 
more than 3,500 diamondback 
terrapins (Malaclemys 
terrapin), a turtle species 
prized in the pet trade for the 
diamond-shaped markings 
on its shell, in the United 
States and Canada. Sommers 
now awaits trial for allegedly 
violating state and federal laws. 


NASA nominee 


US President Donald Trump 
nominated James Morhard, 

a Senate staff member, to the 
number-two job at NASA on 


TREND WATCH 


An analysis of almost 

35,000 biomedical researchers has 
found that after women secure 
their first major research grant 
from the US National Institutes 
of Health (NIH), they are almost 
as successful as men at netting 
more NIH funding. The finding 
casts doubt on the belief that 
women leave science at a faster 
rate than men as their careers 
progress. Women remain under- 
represented among NIH grantees, 
however, winning less than one- 
third of the grants despite earning 
nearly half the PhDs in the field. 


12 July. Morhard (pictured) 
has degrees in business and law 
but no space expertise; he is 
currently serving as the Senate's 
deputy sergeant at arms. If 
confirmed, Morhard would 
join NASA administrator 

Jim Bridenstine, a former 
politician who had reportedly 
been seeking a technical expert 
as his deputy. In a statement, 
Bridenstine said that “this 
administration is committed to 
American leadership in space”. 


Mail-fraud deal 


A mechanical engineering 
professor at the University of 
Colorado Boulder pleaded 
guilty to mail fraud earlier this 
month. According to the US 
attorney's office in Denver, 
Colorado, Oleg Vasilyev 
applied for a US$234,000 
federal contract with the 

US Department of Energy's 
(DOE’) Los Alamos National 
Laboratory in New Mexico 
without informing the 
university. The funds were 
transferred to a university 
account, to which Vasilyev 


SURVIVING SCIENCE 


submitted a series of claims 
for unallowable expenses, 
including more than $140,000 
for travel. In a plea agreement 
dated 6 July, Vasilyev agreed 
to repay the university 
$185,879; the university has 
reimbursed the DOE for 

the misappropriated funds. 
The guilty plea terminated 
Vasilyev’s employment, a 
university spokesperson said. 


POLICY 


Innovation council 
After Brexit, the United 
Kingdom will probably not be 
able to take part in all aspects 
of the European Union's 

next major research funding 
programme, Horizon Europe, 
EU officials have confirmed. 
The European Commission's 
provisional plans for the 7-year 
programme, which begins 

in 2021 and has a proposed 
budget of €100 billion 

(US$117 billion), suggested 
that it would be the most 
global yet, with many parts 

of it open to non-EU nations 
paying a subscription fee. 

But at the EuroScience Open 
Forum in Toulouse, France, 
on 11 July, Robert-Jan Smits, 
the European Commission's 
special envoy for open data 
and former director-general 
of research, said that the 
European Innovation Council, 
a programme launching in 
2021, will not be accessible 

to the United Kingdom 

and other non-EU ‘third’ 


Female scientists maintain NIH project grants for nearly as long 
in their careers as do male scientists, according to an analysis of 


34,770 researchers. 
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countries because the council 
is aimed at supporting start-up 
companies in the EU. Precise 
details are still to be negotiated 
as the EU hammers out the 
legal framework for third 
countries in Horizon Europe. 


Fossil-fuel vote 


Ireland’s lower house of 
parliament voted on 12 July 
to divest its €8.9-billion 
(US$10.4-billion) state- 
owned investment fund of 
fossil-fuel companies “as 
soon as practicable”. The 
upper house must now vote 
on the measure, but backers 
say that the proposal has the 
support of the Irish prime 
minister and is expected to 
become law. If that happens, 
Ireland would become the 
first country to set a goal of 
withdrawing all investments 
in fossil fuels. Norway is also 
considering whether to divest 
its $1-trillion state-owned fund 
of its holdings in fossil-fuel 
companies, but the government 
has yet to come to a decision. 


Integrity inquiry 
The United Kingdom 

should establish a committee 
to monitor universities’ 
misconduct investigations, 

a parliamentary inquiry 
recommended on 11 July. The 
inquiry found that one in four 
UK universities do not comply 
with transparency guidelines 
for such investigations. The 
report, released by the House 
of Commons Science and 
Technology Committee, says 
there is a need to address 

the potential conflicts 

of universities policing 
themselves. Although the 
advocacy group Universities 
UK issued a concordat in 2012 
that requires universities to 
deal with misconduct cases 
transparently and robustly, 
there are currently no sanctions 
for non-compliance. Some 
universities told the inquiry 
that they feared disclosing 
cases of research fraud would 
undermine their reputation. 
See go.nature.com/2meydnz 
for more. 
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NEWSIN FOCUS 


Police in China 
expand sewage analysis to 
monitor illegal drug use p.310 


trial ends as journals grapple 
with open-access issue p31 


Gates Foundation 


Ten new moons of 
Jupiter help trace Solar 
System history p.312 


What scientists 
have faced inside 
Trump’s EPA p. 316 


The IceCube lab is in Antarctica, on the South Pole. 


ASTROPHYSICS 


Particle traced from space 


When aneutrino streaked through Antarctica, astrophysicists raced to find the source. 


BY DAVIDE CASTELVECCHI 


single subatomic particle detected at 
A« South Pole last September is help- 
ing to solve a major cosmic mystery: 
what creates electrically charged cosmic rays, 
the most energetic particles in nature. 
Follow-up studies on the particle's trajectory 
by more than a dozen observatories suggest 
that researchers have, for the first time, identi- 
fied a distant galaxy as a source of high-energy 
neutrinos. This discovery could, in turn, help 
scientists to pin down the still-mysterious 


sources of cosmic rays, the protons and atomic 
nuclei that arrive at Earth from outer space. 
The same mechanisms that produce cosmic 
rays should also make high-energy neutrinos. 
Multiple teams of researchers from around 
the world described the neutrino’s source 
in at least seven papers released on 12 July. 
“Everything points to this as the ultra-bright, 
energetic source — a gorgeous source,” says 
Elisa Resconi, an astroparticle physicist at the 
Technical University of Munich in Germany. 
Astrophysicists have proposed a number of 
scenarios for astrophysical phenomena that 
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could produce both high-energy neutrinos and 
their electrically charged counterparts, cosmic 
rays. But until now, they had not managed to 
unambiguously trace any of these particles 
back to their source. 


MUON ALERT 

The story began on 22 September 2017, when 
an electrically charged particle called a muon 
zipped through the Antarctic ice cap at close to 
the speed of light. the IceCube Observatory — 
an array of more than 5,000 sensors buried in 
a cubic kilometre’s worth of ice — detected 
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> flashes of light that the muon produced in 
its wake. The particle seemed to emerge from 
below the detector — an orientation that indi- 
cated it was the decay product of a neutrino 
that had come from below the horizon. Muons 
can travel only so far inside matter, whereas 
neutrinos often pass through the entire planet 
unimpeded; most of the muons that IceCube 
detects originate from neutrinos that have 
crashed with a particle inside Earth. 

Within seconds, a computer cluster at the 
US National Science Foundation’s Amundsen- 
Scott South Pole Station had reconstructed the 
precise path of the particle and recognized that 
the muon had come from a highly energetic 
neutrino; 43 seconds after the event, the sta- 
tion sent an automated alert to a network of 
astronomers through a satellite link. It tagged 
the neutrino as IceCube-170922A. 

After receiving the alert, Derek Fox, an 
astrophysicist at Pennsylvania State University 
in University Park, quickly secured observing 
time on the X-ray observatory Swift, which 
orbits Earth. He and his team found nine 
sources of high-energy X-rays close to where 
the neutrino had come from. Among them 
was an object called TXS 0506+056. This is a 
blazar, a galaxy with a supermassive black hole 
at the centre and a known source of y-rays. In 
a blazar, the black hole stirs up gas to tempera- 
tures of millions of degrees and shoots it out 
of its poles in two highly collimated jets. In 


this case, one of the jets points in the direction 
of the Solar System. Fox’s team announced its 
findings to the astronomical community the 
next day. 

In the following days, another team 
inspected data from Fermi-LAT, the Large 
Area Telescope aboard NASAs Fermi Gamma- 
ray Space Telescope. Fermi-LAT constantly 
sweeps the sky, and among other things 
monitors about 2,000 blazars. These objects 
go through periods 


of increased activity “Everything 
that can last weeks points to 

or months, during this as the 

which they become ultra-bright, 
unusually bright. energetic source 
“When we looked at —@q gorgeous 
the region that Ice- ggurce.” 


Cube said the neu- 

trino came from, we noticed that this blazar 
had been flaring more than ever before,’ says 
Regina Caputo, an astrophysicist at NASA‘s 
Goddard Space Flight Center in Greenbelt, 
Maryland, who is Fermi-LAT’s analysis 
coordinator. 

On 28 September, the Fermi-LAT team sent 
out an alert to reveal this finding. It was at that 
point that other astronomers got excited. Ice- 
Cube has detected about a dozen such high- 
energy neutrinos each year since it started 
operating in 2010, but none had been associ- 
ated with a particular source in the sky. “That's 


what made the hair stand at the back of the 
neck,” Fox says. 

Researchers with IceCube and Fermi-LAT 
calculated the odds that the flare and the neu- 
trino were related, rather than coming from 
the same direction in the sky by chance. They 
found that likelihood to be good, although not 
at the level of statistical significance required 
for claiming a discovery in physics’”. 

A major missing piece of information was 
the blazar’s distance from Earth, says Simona 
Paiano of the Astronomical Observatory of 
Padua in Italy. To measure it, she and her team 
booked 15 hours of observing time on the 
world’s largest optical telescope, the 10.4-metre 
Gran Telescopio Canarias on La Palma, one of 
Spain’s Canary Islands. They found the blazar 
to be around 1.15 billion parsecs (3.78 bil- 
lion light years) away’. 

Together, the data pinpoint the likely source, 
says Kyle Cranmer, a particle physics and data- 
analysis expert at New York University, but “the 
observation isn’t unambiguous’, he cautions. 
“More follow-up is needed to conclusively 
establish blazars as a source of high-energy 
neutrinos.” = 


1. IceCube Collaboration. Science 361, 147-151 
(2018). 

2. IceCube Collaboration et al. Science 361, eaat1378 
(2018). 

3. Paiano, S., Falomo, R., Treves, A. & Scarpa, R. 
Astrophys. J. Lett. 854, L32 (2018). 


EPIDEMIOLOGY 


Chinese cities scan sewers 
for signs of illegal drug use 


Privacy concerns and cultural differences could limit the technique’s use in other nations. 


BY DAVID CYRANOSKI 


ozens of cities across China are apply- 
D ing an unusual forensic technique to 

monitor illegal drug use: chemically 
analysing sewage for traces of drugs, or their 
telltale metabolites, excreted in urine. 

One southern city, Zhongshan, a drug hot- 
spot, is also monitoring waste water to evaluate 
the effectiveness of its drug-reduction pro- 
grammes, says Li Xiqing, an environmental 
chemist at Peking University in Beijing who is 
working with police in these cities. 

Li says Zhongshan police have already used 
the technique to help track down and arrest a 
drug manufacturer. He says a handful of cities 
are planning to use data from waste water to set 
targets for police arrests of drug users, some as 
early as next year. 
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Although illegal drug use has been moni- 
tored through wastewater-based epidemiology 
(WBE) in other countries, including Belgium, 
the Netherlands, Spain and Germany, most 
studies have collected data for epidemiologi- 
cal research rather than for setting policies. 
“The noteworthy part is that China seems to be 
actually acting on the technique,’ says Daniel 
Burgard, a chemist at the University of Puget 
Sound in Tacoma, Washington. 

Last month, Chinese President Xi Jinping 
said that the country’s war on drugs was tied 
to national security and the welfare of the 
Chinese people. Li says the central and local 
governments will invest at least 10 million 
yuan (US$1.5 million) in WBE monitoring by 
the end of the year. He expects the figure to at 
least double annually for the next few years. 

Li is pushing for the method to be used 
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internationally, including as part of the United 
Nations’ drug control policies. “The experi- 
ence and lessons from the application of WBE 
and its adoption by the Chinese drug police in 
their daily management will be very relevant 
for other countries,” he argues. 

But many issues, such as how police should 
be allowed to analyse the data, the need for 
safeguards to prevent the data from being mis- 
used, and privacy concerns, need to be ironed 
out. Some researchers are sceptical that the 
method will be adopted successfully in other 
countries. 


DRUG USE 

To show that WBE reflects drug use in the 
community, a number of studies have com- 
pared drug levels detected in sewage with other 
data sources on drug use, such as the amount 


XINHUA NEWS AGENCY/SHUTTERSTOCK 


Chinese officers destroy seized drugs. 


of drugs seized by police, and user surveys. 
A 2016 study in eight European cities found 
a strong correlation between the amount of 
cocaine detected in waste water and data from 
drug seizures (J. A. Baz-Lomba et al. BMC 
Public Health 16, 1035; 2016). However, in the 
case of metamphetamines, the correlation was 
not as strong. 

Researchers around the world generally 
agree that WBE can reliably estimate drug use, 
says Shane Neilson, the head of Determina- 
tion for High Risk and Emerging Drugs at the 
Australian Criminal Intelligence Commission 
in Canberra. “The science and findings are 


globally consistent and comparable,’ he says. 
The technique is also used by health research- 
ers to detect other substances excreted by 
humans, such as signs of bacteria and viruses. 

Zhang Lei, an environmental policy 
researcher at Renmin University in Beijing 
who collaborates with Li, notes that WBE 
studies are a more objective way of measuring 
whether government initiatives to reduce drug 
use in the community are working. She says 
that relying solely on conventional methods for 
monitoring changes in drug use, such as the 
number of arrests of users or the number of 
drugs being seized by police, can be misleading 


IN FOCUS | NEWS 


because they are indirect measures. “WBE 
offers an unequivocal measure of the effective- 
ness of efforts,” says Zhang. 

Li and his team put this to the test when 
they measured two popular synthetic drugs, 
methamphetamine and ketamine, in waste 
water across China two years after local and 
national agencies launched campaigns to crack 
down on drug use and manufacturing in 2013. 
Zhang’s team found that after these initiatives, 
methamphetamine use dropped by 42% and 
ketamine use decreased by 67%. Li thinks the 
drop in drug use is a result of police campaigns. 


OTHER COUNTRIES 

Jose Antonio Baz-Lomba, a researcher at the 
Norwegian Institute for Water Research in 
Oslo, says the growing evidence that the tech- 
nology is a reliable measure of drug use should 
encourage other international police authorities 
to take WBE seriously and start collaborating 
with researchers. 

But Carsten Prasse, an environmental- 
health researcher at Johns Hopkins University 
in Baltimore, Maryland, argues that cultural 
and political differences between countries 
will have a substantial effect on this research. 
“In China, the general population is used to 
following the directions given by the govern- 
ment, and privacy-related issues don’t seem to 
be a major concern — the situation is totally 
different in the United States,” he says. 

Prasse says the potential implementation of 
wastewater-based drug monitoring needs to be 
discussed in the community, not only between 
scientists and law enforcement. “WBE repre- 
sents a powerful new tool to assess drug con- 
sumption in our cities, but there is still a lot of 
work to do before it can be implemented ona 
larger scale,” he says. = 


Experimental open- 
access deal ends 


Science’s pilot contract with the Gates Foundation aimed to 
solve a policy conundrum that affects several journals. 


BY RICHARD VAN NOORDEN 


r Vhe publisher of Science last month ended 
a pilot partnership that allowed open- 
access (OA) publishing for research- 

ers funded by the Bill & Melinda Gates 

Foundation. 

The trial was an effort to accommodate a 
policy clash between the Gates Foundation, 
which has enforced strict OA demands since 


2017, and publishers running subscription 
journals that don't comply with those terms. So 
far, 26 papers have been published in Science 
and 4 sister subscription journals as part of the 
18-month experiment, and more might appear, 
says a spokesperson for Science’s publisher, the 
American Association for the Advancement of 
Science (AAAS) in Washington DC. Neither 
the Gates Foundation nor the AAAS com- 
mented on why the deal ended. 
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Under the contract, the Gates Foundation 
paid the AAAS a lump sum of around 
US$100,000 for a trial first year, during which 
16 papers appeared. The two organizations 
then extended their partnership for another 
six months, and continued their contract on 
“similar terms’, but have agreed to keep the 
extra amount paid confidential, says Bryan 
Callahan, an external-relations officer at the 
Gates Foundation. 

Meanwhile, two other influential journals, 
The New England Journal of Medicine 
(NEJM) and Proceedings of the National 
Academy of Sciences (PNAS), quietly 
changed their policies last year to offer a 
permanent OA publishing route for Gates 
grant holders. And although Nature has not 
made a specific agreement with funders, it 
has published some papers under OA terms, 
including two Gates-funded papers this year. 
(Nature’s news team is editorially independ- 
ent of its journal team and of its publisher, 
Springer Nature.) 

The Gates Foundation, based in Seattle, > 
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> Washington, is a global health charity that 
spent $4.6 billion in 2016, much of it allocated 
to research. Each year, more than 2,000 papers 
are published from projects it funds. The foun- 
dation stipulates that these papers, and their 
data, must be made open. 

It’s not the only research funder to have 
such rules, but its policy is stricter than most, 
because it demands that papers are made 
free to read immediately on publication, 
rather than permitting a six-month delay as 
some subscription journals require. And the 
papers must not only be free to read, but also 
be posted under a ‘CC-BY’ licence that allows 
their contents to be reused without restric- 
tions, for example through republication, even 
for commercial purposes. When the Gates pol- 
icy came into force at the beginning of 2017, it 
clashed with the rules of subscription journals 
including Nature, Science, NEJM and PNAS, 
meaning that researchers could not publish 
Gates-funded work in these journals. 

In February that year, however, the AAAS 
and Gates announced their partnership. On 
1 March, NEJM changed its own policy. The 
medical journal generally makes articles free 
to read on its website six months after pub- 
lication, but it agreed to make Gates-funded 


articles free to read immediately, says Jennifer 
Zeiss, communications and media-relations 
manager for the NEJM Group. It also agreed 
to simultaneously make available a CC-BY 
licensed ‘author final version’ of the paper, 
which includes revisions made after peer 
review but lacks final NEJM editing. These 
appear online in the PubMed Central database. 
“At present time, 


NEJM doesnothave “If Gates had 
this arrangement refused topay 
with other funders? the prestige 
Zeiss says. tax, it would not 
AndinSeptember have lost grant 


2017, PNAS — which 
also already makes 
papers free to read on its site six months after 
publication — began offering an OA option 
under a restrictive licence that does not per- 
mit commercial reuse or republication. The 
journal also decided to offer a liberal CC-BY 
licence for authors whose funders mandate it, 
a spokesperson says. 

Nature does not have a specific OA policy 
for Gates grant holders, but the issue is still 
under discussion, and the journal does occa- 
sionally publish papers, which can include 
those with Gates funding, under a CC licence, 


applications.” 


says a spokesperson for Nature Research, the 
portfolio ofjournals that includes Nature. The 
journal has published more than 30 CC-BY 
OA papers since 2017, according to an analysis 
by Nature’s news team, including the two by 
Gates-funded researchers. 

Peter Suber, director of the Harvard Open 
Access Project and the Harvard Office for 
Scholarly Communication in Cambridge, 
Massachusetts, characterizes the AAAS pilot 
as a compromise whereby Gates paid the 
publisher a “prestige tax” for the specific OA 
terms it wanted. 

“To me, the deal was unnecessary and 
undesirable. A wide range of high-quality 
journals were already compatible with the 
Gates publishing terms. If Gates had refused 
to pay the AAAS prestige tax, it would not 
have lost grant applications from first-rate 
researchers,” Suber says. “I’m glad to see it 
come to an end.” 

Other funders haven't imposed terms as 
stringent as Gates’s, notes Stephen Curry, 
a structural biologist at Imperial College 
London, but he praises the stance. “Gates are 
right to stipulate immediate OA as a condition 
of funding, especially in an area of such impor- 
tance to global public health” m 


ASTRONOMY 


Ten new moons spotted 
orbiting Jupiter 


Planet now has 79 known satellites, including one on a collision course with its neighbours. 


BY ALEXANDRA WITZE 


stronomers have discovered 10 small 
Av orbiting Jupiter, bringing its 
total to 79 — by far the most moons 
known around any planet. One of the finds is 
an oddball that moves in the opposite direction 
from its neighbours. 
Together, the moons help to illuminate 
the Solar System’s early history. The exist- 
ence of so many small satellites suggests 


that they arose from cosmic collisions after 
Jupiter itself formed, more than 4 billion 
years ago. 

“They did not form with the planet, but 
were likely captured by the planet during or 
just after the planet-formation epoch,” says 
Scott Sheppard, an astronomer at the Carnegie 
Institution for Science in Washington DC. He 
and his colleagues announced the discovery 
on 17 July. 

Sheppard's team typically hunts for objects 


in the very distant Solar System, out beyond 
Pluto, and sometimes spots planetary moons 
during these searches. Last year, the group 
reported two additional Jovian moons. In 
this case, the scientists were looking for a 
putative unseen massive planet popularly 
known as Planet Nine. Jupiter was in the 
same part of the sky, so they were able to hunt 
for moons as well. 

To discover new Solar System bodies 
and calculate their orbits, the researchers 
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photograph the same part of 
the sky weeks or months apart. 
They then look for objects that 
shift position between the two 
images, relative to the back- 
ground stars. The team first 
spotted most of the new Jovian 
moons using the Blanco 4-metre 
telescope at the Cerro Tololo 
Inter-American Observatory 
in Chile, and followed up with 
further observations at other 
telescopes. 


STRANGE SATELLITES 
All the newfound moons are 
small, between about 1 and 3 kilo- 
metres across. Seven of them 
travel in remote orbits more than 
20 million kilometres away from 
Jupiter, and in the opposite direc- 
tion from the planet’s rotation. 
That puts them in the category 
known as retrograde moons. 
The eighth moon stands out 
because it travels in the same 
region of space as the retrograde 
moons, but in the opposite direc- 
tion (that is, in the same direction 
as Jupiter’s spin). Its orbit is also 
tilted with respect to those of the 
retrograde moons. That means 
it could easily smash into the retrograde 
moons, pulverizing itself into oblivion. It 
may be the leftovers of a bigger cosmic colli- 
sion of this nature in the past, Sheppard says. 


These images show the movement of the Jovian moon dubbed Valetudo 
(labelled in yellow) relative to the background stars. 


Jupiter’s moons are named after gods 
with connections to the mythological 
deities Jupiter or Zeus. Sheppard has pro- 
posed naming the oddball Valetudo, after 
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one of Jupiter’s descendants, the 
Roman goddess of hygiene and 
health. 

The ninth and tenth newfound 
moons orbit closer to Jupiter, mov- 
ing in the same direction as the 
planet. 

Had all these small moons 
formed at the same time as Jupi- 
ter, they probably would have 
been captured by the gas and dust 
still swirling around the newborn 
planet, and have been engulfed. 
Their existence suggests that they 
are leftovers of later collisions 
between space rocks that left the 
debris encircling Jupiter. 

If astronomers can work out the 
history of these collisions, they 
could also determine the sizes 
of any satellites that were pulled 
into the orbit of a young Jupiter. 
“That’s the big question, and that’s 
what makes these ten new moons 
interesting,’ says Douglas Hamil- 
ton, an astronomer at the Univer- 
sity of Maryland in College Park. 
“How can we link all this to how 
planets formed?” 

Sheppard says there might still 
be a few more moons of Jupiter to 
discover — as yet unseen because 
they were hiding in the Sun’s glare when the 
scientists were looking. Saturn, the runner- 
up to Jupiter in the moon competition, has 
62 known satellites. m 


MEDICAL RESEARCH 


Gene therapy in mouse fetuses 
treats deadly disease 


The method could minimize damage from disease if a condition is diagnosed in utero. 


BY HEIDI LEDFORD 


ene therapy administered in the womb 
Ge be used to treat a deadly genetic 
disease, a study in fetal mice suggests. 
The results could add to the increasingly 
popular approach of using prenatal gene 
therapy to minimize the damage wrought 
by some genetic diseases. The US Food and 
Drug Administration approved the first gene 
therapy for adults and children last year, and 
more treatments are crowding pharmaceutical 
pipelines around the world. 
Simon Waddington, the lead author of the 
latest study, says he used to meet with shocked 
stares when he talked about treating fetuses 


with gene therapy. “It had gotten to the point 
where Id given up on telling people that fetal 
gene therapy is a good idea,’ says Waddington, 
who studies gene therapy at University College 
London. “And now, not infrequently, people 
turn to me and say, ‘You know what would be 
a good idea? Fetal gene therapy.” 

The mouse study, published on 16 July in 
Nature Medicine’, uses prenatal gene therapy 
to tackle a condition — acute neuronopathic 
Gaucher's disease — caused by mutations in 
a gene called GBA. These mutations disrupt 
the breakdown ofa particular fatty molecule, 
or lipid. As a result, the lipid accumulates 
in brain cells and other parts of the body, 
contributing to organ dysfunction. 
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The study looks at whether the disease can 
be treated by using a virus to supply normal 
copies of GBA to a developing fetus. That could 
minimize the irreparable brain damage that 
arises as the lipid accumulates. 

Some forms of Gaucher’s disease can be 
treated by supplying normal copies of the GBA 
enzyme to break down lipids, but that enzyme 
cannot cross from the blood into the brain. 
Children with acute neuronopathic Gaucher's 
disease rarely live past two years. 


DIFFICULT CROSSING 

The condition is so devastating that colleagues 
were sceptical about his team’s ability to treat 
it, says Waddington. “People told me, > 
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> ‘You're not going to touch this.” 

One hurdle was simply getting the virus to 
carry the healthy gene into the brain. Viruses 
used in previous tests had to be injected 
directly into the brain, and then they diffused 
only a short distance from the injection site. 
But in 2009, researchers showed?” that a par- 
ticular virus, simply injected into the blood, 
could reach the central nervous system. From 
there, it dispersed throughout the brain. 

Waddington began working with mice, 
loading up the virus with a normal copy of GBA 
and looking for ways to express it specifically 
in the central nervous system. His team tested 
its virus in fetal mice carrying GBA mutations 
that cause symptoms similar to neuronopathic 
Gaucher's disease. Such mice normally live for 
only 15 days after birth; treated mice, however, 
survived for at least 18 weeks and were able to 
move about normally. 


FETAL FRONTIER 

The work is impressive, says Tippi 
MacKenzie, a fetal-medicine specialist at 
the University of California, San Francisco. 
MacKenzie has been conducting a clinical 
trial of prenatal stem-cell transplants. “Fetal 
gene therapy or enzyme-replacement therapy 
may be the next frontier,” she says. “It is won- 
derful to see this kind of rigorous research, to 
take us one step further.” 


Treating fetuses has several potential 
advantages. Chief among them is the potential 
to minimize the damage caused by a genetic 
disease. Some of these conditions — such as 
neuronopathic Gaucher’s disease and spinal 
muscular atrophy — can cause irreversible 
symptoms before birth. 

It is also easier to administer some thera- 
pies to the brain in a developing fetus than 

in an adult or child, 


“People told because the blood- 
me, ‘You’re not brain barrier — a 
going to touch membrane that pre- 
this.’” vents some molecules 


from crossing into 
the brain from the blood — is more permeable. 

“Even one day after birth, it’s harder to get 
into the brain,” says Jerry Chan, an obstetri- 
cian and gynaecologist at Duke-NUS Medical 
School in Singapore. 

And the fetal immune system is also still 
developing, making it less likely to recognize 
the newly expressed protein as foreign. Adult 
immune systems sometimes generate anti- 
bodies against the new protein, which can 
prevent it from carrying out its function. 


WEIGHING THE RISKS 

Chan and others have previously tested fetal 
gene therapy to treat haemophilia in mice and 
macaques, and Chan expects that there will 


be interest in doing so for several metabolic 
diseases similar to Gaucher's. 

But there are risks. Researchers developing 
a prenatal gene therapy must think not only 
about the fetus, but also about the mother, who 
will inevitably receive a dose of treatment as 
well, says Chan. 

And clinicians have to be absolutely certain 
that the mutation they’ve found will cause dis- 
ease, notes Waddington. This may mean com- 
bining genetic tests with other tests performed 
in utero, to confirm the disorder. 

“We're now at the point where it’s possible 
to diagnose these diseases,” he says. “It’s 
making people think: maybe we should be 
doing this.” m 
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CORRECTION 

The News Feature ‘Bat man’ (Nature 559, 
165-168; 2018) talked of the Egyptian 
fruit bats under study being collected in the 
Jordanian hills instead of the Judean hills. 
It also incorrectly labelled the picture of a 
neural logger as being a GPS logger. 
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SUIENGE 
UNDER Sie 


Uncertainty, hostility and irrelevance are 
part of daily life for scientists at the US 
Environmental Protection Agency. 


BY JEFF TOLLEFSON 


he day Donald Trump took office as US president, the mood 

was sombre at the main research campus of the Environ- 

mental Protection Agency (EPA) in Durham, North Car- 

olina. As scientists arrived for work, they saw pictures of 

former president Barack Obama and the previous EPA 
administrator, Gina McCarthy, coming down off the walls. Research- 
ers had reason to be anxious: Trump had threatened many times during 
his campaign to shutter the EPA, and he had already taken steps along 
that path. Weeks before he moved into the White House, Trump had 
nominated Scott Pruitt to head the agency — a man who had spent his 
career filing lawsuits to block a variety of EPA regulations. 

When Trump put his hand on the Bible to take the oath of office 
on 20 January 2017, many EPA scientists kept their heads down. They 
wondered who might be fired first, and they warned each other to cen- 
sor their e-mails, for fear that the new administration would monitor 
communications for any comments criticizing it. 

Dan Costa wasn't so worried. After nearly 32 years working at the 
EPA, he had seen the agency weather many political storms, and he 
had not lost sleep over the prospect of working for Pruitt and Trump. 
When inauguration day came, Costa streamed Trumps speech on his 
computer and went straight back to work. 

“There was a lot of fear and anticipation, but I figured we would push 
through it,” says Costa, who at the time headed the department's air, 
climate and energy research programme. 

Over the next 18 months, however, Costa would grow increasingly 
concerned about the Trump administration’s impact on the agency. 
Since assuming power, this administration has launched more assaults 
on the EPA than on any other science agency. The president has sought 
to slash its budget by nearly one-third, and Pruitt’s team has tried to 
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weaken the part that science plays in setting environmental regulations. 
He barred some top researchers from participating in EPA advisory pan- 
els, and replaced them with scientists who are more friendly to industry. 
All of this has elevated the power of corporations to influence the rules 
that govern chemicals and pollutants. 

But what is it like for the more than 1,000 scientists working at the 
EPA itself? To find out, Nature has conducted dozens of interviews over 
the past year and a half with current and former agency staffers. 

The interviews show that day-to-day work has changed little for many 
EPA researchers. They continue their investigations into everything from 
ecology and toxicology to hydrology and air quality, in an effort to bolster 
the scientific foundations for health and environmental regulations. 

What has damaged researchers’ morale is the endless uncertainty 
about all aspects of their work, and the thinly veiled hostility from the 
administration. It’s the onslaught of media stories about budget cuts, staff 
lay-offs and efforts to weaken environmental and health regulations. It’s 
the ever-growing scent of scandal as Pruitt came under media fire for 
lavish spending with government funds, allegedly using his office to find 
a lucrative job for his wife, among other potential ethical breaches. Pruitt 
denied any wrongdoing, but ultimately resigned on 5 July. 

What most troubles many EPA scientists is the Trump administra- 
tion’s systematic and unprecedented effort to undermine the way in 
which science is used by the agency. Scientists there say they and their 
work have been largely ignored by senior EPA leadership. And despite 
Pruitt’s resignation, few expect the administration’s overarching EPA 
strategy to change once Trump appoints a new administrator. For now, 
the leadership reins fall to Andrew Wheeler, a former coal lobbyist. Ina 
pair of tweets announcing Pruitt’s resignation, Trump said that Wheeler 
would “continue on with our great and lasting EPA agenda’. 
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Dan Costa was a scientist 


at the Environmental 
Protection Agency, for. 
moredthan 32 years. 


Many researchers say that this strategy could subvert the scientific ff 


process altogether and put tens of thousands of lives at risk each year, as 
a result of weakened regulations on pollutants and potentially hazardous 
chemicals. 

The turmoil has affected everyone. Most have kept their heads down, 
hoping that science will somehow prevail. Many have censored their 
own language, shunning words such as ‘climate’ or ‘global warming’ 
to avoid attention. Some have delayed retirement to keep the agency 
functioning. Others have quit. 

“There's a lot of fear, a lot of angst and anxiety, and employees don't 
know what to do,’ says Kyla Bennett, director of science policy at the 
environmental organization Public Employees for Environmental 
Responsibility (PEER) in North Easton, Massachusetts. PEER works 
directly with many government whistle-blowers. “This is unlike 
anything we've ever seen,’ Bennett says. 

Costa has watched the situation deteriorate. As he tried to carry on his 
own work, his mood grew darker and more philosophical. Eventually, he 
realized he had to leave. “They are acting with such impunity, and with 
no accountability,’ he says of the administration. “It’s just unfortunate, 
and scary.” 


THE FIRST 100 DAYS 

At the beginning of Trump’s presidency, Costa's long history with the 
agency helped him to cope. A toxicologist by training, he joined the EPA 
in 1985 under president Ronald Reagan, looking at the physiological 
effects of pollutants. He arrived shortly after the tenure of Anne Gorsuch, 
a staunchly conservative administrator — much like Pruitt — who had 
slashed budgets and weakened environmental protections during her 
time heading the EPA from 1981 to 1983. Yet Costa watched the agency 


slowly bounce back. 

That episode served as a 
reminder that the institution 
is larger than any individual, 
Costa told Nature in early 
March 2017, during one of a 
series of interviews initially 
conducted off the record 
because he didn’t have per- 
mission to talk to the press. He 
later agreed to bring the entire 
series on the record. 


THERE'S A LOT OF 
FEAR, A LOT OF ANGST 
AND ANXIETY, AND 
EMPLOYEES DON'T 

KNOW WHAT TO D0." cicsontessnt. “ * 


censorship and looming budget cuts. Costa said that much of it was 
probably true, but he also stated that such stories can grow out of pro- 
portion. “It’s not like there are memos coming down. It’s just rumours,” 
he said about talk of censorship. “And in the absence of good informa- 
tion, it’s easy for people to create their own demons.” Younger scientists 
had been coming to him for advice, asking whether they should start 
looking for jobs, and his advice was simple: dont panic. 

The Trump administration soon made its intentions clear. On 16 March 
2017, it released a proposal to slash the EPAs US$8.2-billion budget by 
31% and eliminate some 3,200 of the agency's 15,000 positions. 

Among the hardest hit in the budget proposal was the division where 
Costa and some 1,100 other scientists worked: the Office of Research 
and Development (ORD). As the main science arm of the agency, the 
ORD has helped to lay the technical foundation for modern environ- 
mental regulation in the United States. The Trump administration 
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had proposed nearly halving its budget from, 
$483 million to $250 million, which left 
scientists there stunned. 

“Management at all levels are trying to 
reassure employees, but you can’t help but 
worry,’ Lesley Mills told Nature at the time. 
“These are people who are dedicated to pub- 
lic service, and they feel like they are being 
treated as an enemy,’ said Mills, an EPA biolo- 
gist in Narragansett, Rhode Island, and a union 
representative. 

She and others knew that the planned cuts 
might never happen. Congress has authority 
over the budget in the United States and often 
decides to override presidents’ budget requests. 
And legislators — including many important 
Republicans — were unusually sceptical about 
Trump’s first proposal. Ignoring the admin- 
istration’s calls for sharp cuts to EPA, on 30 
April the Republican-controlled Congress 
approved a relatively mild reduction of 1% for 
the remainder of the 2017 fiscal year. 

It felt like a triumph for many scientists, but 
Costa was already beginning to change his 
tone. When he attended an inaugural March 
for Science event in April in Raleigh, two EPA 
scientists with him instinctively ducked and 
threw their hands up to hide their faces when 
a news photographer approached. They told 
him that they didn’t want to encounter ques- 
tions later from political leadership at the agency. 

Costa also found himself encouraging one promising young postdoc 
to apply for a position elsewhere, because he thought EPA jobs were 
unlikely to open up in the next few years. He knew of managers who 
had told younger scientists to take the word ‘climate’ out of document 
headlines. “That sends all sorts of ripples through the organization,’ he 
said in May 2017. 

At the same time, Costa was making his own changes. He was quietly 
trying to expand the air, climate and energy research programme that he 
ran to advance a new line of science, protect his team and avoid atten- 
tion from higher authorities at the agency. As he sat in meetings and 
drafted reports, he talked increasingly about public health and wildfire 
smoke rather than just the industrial air pollutants that his programme 
had historically focused on. Costa described the proposed shift in sci- 
entific focus as a positive change that would define a useful agenda for 
his programme without limiting the science that it could pursue, in part 
because climate change, air quality and public health are all interrelated. 

“T don't want to sit back and wait” for any restrictions to be imposed 
by political leaders at the agency, Costa said. “I want to occupy the space 
before they do, because they are essentially clueless.” 


A GROWING RIFT: SUMMER 2017 

All the while, Pruitt was busy trying to roll back environmental regula- 
tions put in place by Obama — including regulations that Pruitt had 
challenged while serving as Oklahoma attorney-general. On 28 March, 
Trump authorized Pruitt to repeal landmark regulations intended to 
curb greenhouse-gas emissions from existing power plants. The next 
day, Pruitt declined to ban a powerful pesticide called chlorpyrifos, 
overruling agency scientists who had previously determined that the 
chemical had negative impacts on brain development in children (see 
go.nature.com/2n7pofa). 

What alarmed scientists about these and other actions was not so 
much that Pruitt and Trump were moving in a different political direc- 
tion from the Obama administration; government scientists are used to 
that. But under previous administrations, regardless of political stripe, 
there was at least some deference paid to scientists. 

That all changed with Trump. Pruitt and his senior political 
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EPA CHIEF RESIGNS 


EPA administrator Scott Pruitt resigned on 5 July. 


appointees — often dubbed the “politicals” — rarely consult with career 
scientists. In many cases, scientists were left dumbfounded, in part 
because the complete lack of consultation with agency experts could 
end up hurting Pruitt’s own agenda. By bypassing EPA scientists and 
ignoring their findings, his team ran the risk of weakening the EPA’s 
defence in the many lawsuits that states and environmental groups were 
filing against the agency. 

“The politicals literally do not talk to the career people,’ says one 
senior scientist. That researcher and nearly all active EPA staff 
interviewed for this story sought anonymity because they were not 
authorized to talk to the press. “They just do what they want, and then 
they inform us,” says the senior researcher. 

In an effort to cope with the new reality, another senior official said, 
career scientists looked for areas of common ground with the leadership 
and, in a curious dance, both sides tiptoed around the issue of climate 
change. “It’s like Voldemort — he who shall not be named, the official 
said in mid-2017. 

“There are weeks when everyone in the office is just chugging along 
like normal,” says one mid-level scientist. Inevitably a scandal arises, 
he continued, “and then for a day or two you feel like you are in a fog”. 

Although they carry on with their work, many scientists feel as if 
their efforts don't matter to the top of the agency. Within the Office 
of Research and Development, exchanges with senior EPA leadership 
nearly always go through an intermediary: Richard Yamada. Yamada, 
deputy assistant administrator of the office, was willing to communicate 
ideas up the chain, according to multiple scientists, but he often seemed 
adrift on technical or scientific issues. 

Yamada asked such odd questions during one video conference that 
researchers in the meeting found themselves looking at each other in 
confusion. “You go into these briefings, and you have no idea what the 
questions are going to be.” (The EPA did not grant Nature’s request to 
speak to Yamada and has not responded to multiple requests for com- 
ment on the allegations in this article.) 

The rift between the scientists and EPA leadership was fully exposed 
in late July 2017, when news broke that Pruitt’s team was circulating a 
list of names of climate sceptics. Many assumed the EPA was looking 
for sceptics to participate in a proposed debate about the validity of 
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climate science or, potentially, for appointments to science-advisory 
positions. The proposal came as the EPA was conducting a technical 
review of a government assessment of current climate science. People 
from both inside and outside the agency had raised concerns about 
whether Pruitt — who as recently as four months earlier had questioned 
the scientific consensus on climate change — and his political appoin- 
tees would meddle with the document. 

Pruitt’s team eventually let the scientific assessment move forward. 
Costa and others gave the agency credit for that decision. “They have 
the authority to slow these assessments down or stop them, if they want,’ 
he said at the time. “In spite of all of the rhetoric, it’s going through a 
reasonably normal process.” 

For Costa, it was evidence that in many senses, the EPA’s leadership 
doesn't really care about what scientists do — unless and until it gets in 
the way of Trump’s agenda to roll back regulations on industry. But as it 
turned out, the administration was just getting started. 


TENSIONS GROW: AUTUMN 2017 

On 31 October — Halloween, no less — Pruitt dropped a bombshell 
on the scientific community in the United States. He announced that 
scientists with active EPA grants would be banned from serving on the 
agency's main science advisory board (SAB) or ona separate commit- 
tee focused on air regulations. Such committees provide peer review of 
the science underlying most EPA regulations; Pruitt’s decision prevents 
some of the nation’s top environmental scientists from taking part in 
that process. 

Pruitt justified his action with a damning charge: research grants 
provided by the EPA, he said, could bias scientists and the advice they 
give to the agency. Scientists were shocked because this policy stands in 
sharp contrast to those of other science agencies, such as the National 
Institutes of Health, and also because researchers with industry sup- 
port were not similarly barred from EPA advisory boards. The surprises 
didn’t end there. Pruitt also called for limiting the tenure of board mem- 
bers, which would force even more scientists to cycle off the board. 
Pruitt would thus get to select replacements more quickly. 

Asa result, 18 of the 44 members of the science advisory board 
are now Pruitt appointees. By the end of September, Trump's team at 
the EPA will have appointed roughly two-thirds of the council, says 
Christopher Zarba, who until his retirement in February managed 
the board's activities at the EPA. Many fear the board will increasingly 
hew to the desires of powerful interests involved in everything from 
chemicals to energy and manufacturing. 

Perhaps most significantly, Pruitt selected Michael Honeycutt to chair 
the SAB. Honeycutt is a toxicologist with the Texas Commission on 
Environmental Quality in Austin, Texas, who has long opposed stricter 
air-quality standards. (Honeycutt told Nature that he hopes he will be 
judged on the basis of the job he does as the chair of the board.) And 
Pruitt appointed Tony Cox, an industry-friendly consultant who has 
challenged scientific studies linking air pollution and human mortality, 
to lead the Clean Air Science Advisory Committee (CASAC). By statute, 
that group must review the science before the agency updates its core 
air-quality standards. 

By the time of those appointments, Costa was already growing weary 
of the attacks on science, but he still saw room to do some good by 
reorienting his programme. Costa had long lobbied to focus more 
research on wildfires because they contribute a large fraction of the fine- 
particulate pollution across the country, he says. The agency, however, 
had devoted its resources for decades to tackling industrial air pollution. 

With Trump and Pruitt in office, Costa thought the time was right to 
give his programme a new mission by including a focus on wildfires. In 
early December, the air-quality regulatory division — the primary cus- 
tomer for the air-research programme — informally endorsed Costa’s 
new research agenda. With that small victory, Costa, who was 69 at the 
time, decided it was time to leave. 

“T certainly didn't want to be a rat jumping ship,” he said. But with five 
children, five grandchildren, a new riding lawnmower and a sudden 
dedication to science activism, Costa has more than enough to keep 
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himself busy on the outside. “I just didn’t think I would do well in the 
current atmosphere.” 

On 5 January, two weeks before Trump celebrated his first year in the 
White House, Costa went into the EPA one last time. His co-workers 
had already thrown him a party — complete with a Beatles-themed 
musical skit. As the end of his final day, Costa packed up the remaining 
boxes, turned in his parking pass and headed home. 


PRUITT RESIGNS: SPRING 2018 

Over the ensuing months, more news emerged about Pruitt’s alleged 
ethical transgressions. There were investigations, congressional hearings 
and endless speculation about how long the embattled administrator 
could retain the favour of his mercurial White House boss. In the end, 
Pruitt would stay on for another six months — and drop yet another 
bomb on scientists at the agency. 

On 24 April, Pruitt announced a proposal that would prevent the EPA 
from using any research in its regulatory decisions unless the under- 
lying data and methods are publicly available. He did so in the name of 
transparency, but scientists and other experts immediately fought back. 

The problem, they said, is that privacy restrictions — such as ones 
governing medical records — often limit the data that can be released 
from epidemiological studies, to protect patients’ identities. Pruitt’s 
proposal could therefore eliminate much of the core epidemiological 
research that the EPA has used to help justify air-quality regulations. It 
was, in their view, just another effort to prevent the agency from devel- 
oping meaningful health and environmental regulations. In one analysis 
released in April, a group of former EPA officials found that Pruitt’s 
policy, if implemented two decades ago, could have precluded regula- 
tions that now prevent some 50,000 deaths each year from air pollution 
(see go.nature.com/2zmrmgt). 

When the news broke, Costa was so incensed that he reached out to 
Nature from retirement. “Keep your eyes on this: it’s an IED [improvised 
explosive device] designed and 
set to destroy the agency’s abil- 
ity to do its job,” Costa wrote 
in a text message. Pruitt, he 
continued, “is a slick bastard”. 

A day after the rule was 
announced, a poster of Pruitt 
signing the rule, with grand 
proclamations about transpar- 


NOT BE NAMED.” 
: ency in science, appeared at 
the entrance of the ORD’s main 


building in central Washington DC. For many scientists, it was yet 
another insult. 

“That poster said, ‘I’ve got you, and there’s not a damn thing youcan 
do aboutit;” says the senior scientist at the EPA. “They are making sure 
that we understand that there’s a new sheriff in town.” 

For his part, Costa says he doesn’t have any regrets. He is enjoying 
the summer in a remote stretch of coastal Rhode Island, where he used 
to spend time during his youth. But clearly he hasn't let go — in part, 
perhaps, because he still doesn’t know how the story will end. “The light 
at the end of the tunnel just doesn’t seem to be there,” he said in late May. 

When the news of Pruitt’s departure came down on 5 July, Costa 
was dawdling in the garage. His wife ran out of the house to tell him 
and his mobile phone lit up with texts from friends, family and former 
colleagues at the EPA. Costa was relieved, if not surprised. Looking 
forward, he hopes that Wheeler — who spent four years at the agency 
in the early 1990s — will not be so quick to ignore science and scientists, 
even if he does toe the Trump line. 

And after a few recent conversations with former staff members, 
Costa seems newly encouraged that they will keep the embers burning 
until the political winds shift again and sweep away Trump’s team. “In 
some senses, I think of it like the locusts,” he says. “They come, they wipe 
out the crops and then they leave? m= 


Jeff Tollefson reports for Nature from New York. 
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Three ways Survey Luminous 
researchers must make of health inequities in novel nails post-doc woes 
algorithms more fair p.324 African Americans p.328 with gallows humour p.329 


Trump will not crush 
scientists who have withstood 
decades of sanctions p.331 


China is clamping down on unauthorized coal-fired factories, such as this one, to reduce carbon emissions. 


Beat protectionism and 
emissions at a stroke 


Applying carbon charges, not trade tariffs, to imports would bolster 
the Paris Agreement, argue Michael Mehling and colleagues. 


wo huge multilateral issues — free 
| trade and climate change — top 
policymakers’ agendas in 2018. This 

offers a chance to couple them. 


More and more countries are shielding 
domestic producers from foreign competition 


—a process known as protectionism. Since 
January, US President Donald Trump has 
slapped tariffs of up to 50% on many imports, 
including washing machines, solar cells, soya 
beans, steel and aluminium. Hopes that allied 
countries would be exempt were dashed 
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after a tumultuous G7 meeting in June. 
Economies affected have begun to respond 
in kind. China hit back with levies on US$34 
billion worth of US goods. The European 
Union increased tariffs on jeans, motorbikes 
and bourbon imported from the United 
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> States. And Trump has since threatened 
to add tariffs on another $200 billion worth 
of Chinese goods. A trade war is unfolding. 

Meanwhile, nations are reviewing the 
pledges they made to cut emissions as part of 
the 2015 Paris Agreement. Everyone knows 
that current pledges will not keep global 
warming below the ‘safe’ limit of 2 °C above 
preindustrial levels — even if all nations 
deliver on their promises. The question is 
how to strengthen actions so that emissions 
drop sharply once the Paris framework takes 
effect in 2020. 

The Paris process has two main problems. 
First, the pledges are uneven. Countries that 
do little will benefit from hefty cuts made by 
others. In the ultimate free ride, the United 
States will withdraw from the Paris Agree- 
ment in 2020, leaving others to do more. Sec- 
ond, carbon emissions ‘leak’ across borders. 
A country can keep its budget low by buy- 
ing carbon-intensive goods made elsewhere. 
Some regions, such as Western and North- 
ern Europe, import a considerable share of 
high-emission goods, allowing them to emit 
less themselves (see ‘Carbon balance). 

Over the next two years, there will be a 
flurry of activities relating to trade and cli- 
mate change. This is a perfect opportunity 
to tie together the two agendas. 


Turkish steel awaits processing 
in the US state of Texas. 
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Governments should levy a carbon charge 
on imports. These ‘border carbon adjust- 
ments’ (BCAs) would level the emissions 
playing field by imposing the same economic 
burden on domestic and external manufac- 
turers. Producers would lose the incentive 
to manufacture goods in places with weaker 
carbon regulations. Trade partners would 
then prefer to manufacture and export low- 
carbon products to avoid penalties. 

Political interest in BCAs is growing. In 
2017, French President Emmanuel Macron 
called them ‘indispensable’ for European cli- 
mate leadership, and Canadian environment 
minister Catherine McKenna recommended 
closer scrutiny. Mexico included them in its 
Paris pledge. But there are fears of retalia- 
tion and some confusion over the legality 
of BCAs. Here's how nations could proceed. 


CLIMATE-SMART TRADE 

BCAs should be applied to imported goods 
in line with their carbon footprint (the total 
of all the emissions released during their 
manufacture). A country might impose 
a fee on imported steel if its carbon foot- 
print is higher than that of domestic steel, 
for example. Alternatively, governments 
might require importers to purchase emis- 
sion allowances in carbon markets. The 
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US House of Representatives backed such 
an approach when it passed the Waxman- 
Markey Bill in 2009; however, the bill failed 
to reach a vote in the Senate. 

California is the only jurisdiction to have 
introduced BCAs, in its energy market. 
Since 2013, electricity delivered into Cali- 
fornia from neighbouring states has been 
subject to the same carbon constraints 
as that generated domestically. This has 
stopped electricity suppliers from shifting 
power generation to states with lax climate 
policies. 

So far, efforts to limit emissions leakage 
have been less efficient. Some countries, 
including Germany, offer regulatory relief 
or compensation payments to domestic 
emitters to persuade them not to relocate. 
But such measures have undermined their 
other climate policies. For example, when 
the EU handed out emissions-trading per- 
mits for free, it weakened incentives to curb 
emissions and produced windfall profits for 
some energy-hungry companies. 

BCAs, by contrast, bolster climate 
policies. By restricting trade in carbon- 
intensive goods, they accelerate decar- 
bonization even in countries with weak 
regulation. They also appeal to policymak- 
ers, manufacturers, trade associations and 
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The United States has the 
highest emissions per capita in 
the world, yet imported goods 

have 7% more CO, embedded 
than home-made equivalents. 


CARBON 
BALANCE 


Carbon dioxide is released during the 
manufacture of goods. The difference 
in these ‘embedded emissions’ 
between imports and exports varies 
widely, depending on each country's 
economy and industries and where it 


sources goods. Western Europe and ‘te 


North America import from Asia and 
Eastern Europe. Therefore the former 
tend to be net importers of 
embedded emissions; the latter, 

net exporters. 


labour unions who are concerned about a 
nation’s economy and jobs. Imposing BCAs 
on imports from the United States would 
prove politically popular with these groups. 
It would also strengthen the hand of pro- 
gressive US states and cities that produce 
low-carbon goods. And it would get atten- 
tion from targeted countries: US leader- 
ship has already shown that it is sensitive 
to trade measures, as evidenced by its quick 
reaction to retaliatory tariffs. Critics argue 
that BCAs are difficult to implement. They 
point to legal risks and the complexity of 
measuring carbon footprints. 


LEGAL ISSUES 
Some policymakers worry that BCAs violate 
a fundamental principle of the World Trade 
Organization (WTO): non-discrimination. 
This principle limits the ability of trade 
partners to adopt measures that distinguish 
between equivalent domestic and imported 
products. For example, it forbids regulators 
from applying a fee to imported cement 
unless the same fee applies to locally pro- 
duced cement. That helps to avert protection- 
ism. It also prevents countries from arbitrarily 
favouring one trade partner over another. 
Because BCAs distinguish between prod- 
ucts on the basis of their carbon footprint, 
they risk being considered discriminatory. 
For example, producers that make steel in 
open-hearth or blast-oxygen furnaces might 
incur a charge; those that produce the alloy 
in more-efficient electric-arc furnaces might 
not. Similarly, imposing BCAs on trade part- 
ners according to how much carbon they 
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China has high carbon 
emissions, and domestic 
goods have larger carbon 
footprints than do imports. 


India's exports are typically " 
carbon-intense, because ¥ 
factories use fossil fuels. dé 
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produce could be seen as favouring some 
countries over others. 

However, because BCAs aim to mitigate 
climate change, they would fall under a set 
of exceptions set out in WTO law. These jus- 
tify measures “necessary to protect human, 
animal or plant life or health” or “relating 
to the conservation of exhaustible natu- 
ral resources”. To meet those criteria, BCA 
documentation 


must spell out the “Some of the 
environmental United States’ 
goal, and the fees trade partners 
must be imposed could forgea 
through a trans- coalition.” 


parent and fair pro- 

cess. Using the revenue to fund mitigation 
and adaptation efforts, including in develop- 
ing countries, would strengthen their envi- 
ronmental justification (M. Grubb, Clim. 
Policy 11, 1050-1057; 2011). 


NEXT STEPS 
BCAs should be introduced concurrently 
by a group of countries with the same goals. 
Some of the United States’ trade partners 
could forge a coalition, including, for exam- 
ple, the EU, Canada and Mexico. Likewise, 
China has been strongly hit by US tariffs, 
and has expressed its intention to seek 
new alliances on climate and trade policy. 
Together, these nations have considerable 
economic clout — enough to secure US 
attention. 

Asa first step, these countries could base 
their response to US tariffs on the carbon 
intensity of goods. That alone would send 
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No data 


exports, expressed as a percentage of domestic 
emissions from production. 


a message about the importance of climate 
change. 

As these countries advance increasingly 
ambitious climate policies, they should 
transition to a BCA that stands on inde- 
pendent footing from the current tariff 
conflict. The design of the programme must 
balance legal durability, ease of implemen- 
tation and environmental performance. 
Below are our recommendations, based on 
a research project concluded last year (see 
go.nature.com/2kdhejm). 


Determine scope and coverage. BCAs 
should target only countries that are not a 
party to the Paris Agreement. If the United 
States follows through with its announced 
withdrawal, its trade partners could impose 
a BCA. Periodic reviews of the adjustments 
should be tied to the global stocktak- 
ing process under the Paris Agreement. 
BCAs should be applied to a limited list of 
imported goods from sectors that emit a lot 
of carbon, such as iron, steel, aluminium, 
oil and gas refining, cement and lime, basic 
inorganic chemicals and pulp and paper. 
That would reduce the administrative bur- 
den, yet still realize many of the environ- 
mental benefits. In the United States, these 
sectors account for roughly half of all manu- 
facturing emissions. 


Calculate the footprint and set the adjust- 
ment. The carbon footprint should include 
direct emissions from production and indi- 
rect emissions from energy and heat inputs, 
and could be based on average benchmarks 
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for each sector. Producers of low-carbon 
goods should be allowed to document 
their actual emissions with audited data. 
The BCA should reflect the difference 
in carbon con- 


straints between “Turn this 
the aie and incipient trade 
Sena aoe war into an 
pues. For xamP' — opportunity 
if aluminium in the 
j to ratchet 
imposing country ° 

up climate 


faces an average 
compliance cost of 
$30 per tonne of 
carbon emissions, and imported alumin- 
ium is subject only to an average of $10 per 
tonne in its country of origin, then the $20 
difference would be imposed as a BCA. 


ambition.” 


Ensure the process is fair. Equity, trans- 
parency and predictability are essential to 
legal durability. Countries should notify 
trade partners in advance and discuss the 
details with them. An independent arbi- 
ter could audit the plans before they are 
adopted and determine whether they are 
reasonable. The plan should include pro- 
cedures for appealing. 

The next few years will be a crucial time 
for both trade and climate policy. Trump 
plans to renegotiate the North American 
Free Trade Agreement (NAFTA), having 
already ended US participation in the 
Trans-Pacific Partnership (TPP). The 
United Kingdom and EU must rewrite 
their joint policies on trade and climate 
change. Nations will review their Paris 
pledges with a view to strengthening cli- 
mate ambition. All these processes are 
parts ofa larger puzzle. 

As the pieces of the jigsaw fall into 
place, momentum must be sustained on 
climate change. Rather than prolong- 
ing the current spiral of tariff tit-for- 
tat, countries should rally and turn this 
incipient trade war into an opportunity 
to ratchet up climate ambition. m 
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Design Al so 
that it’s fair 


Identify sources of inequity, de-bias training data and 
develop algorithms that are robust to skews in data, 
urge James Zou and Londa Schiebinger. 


hen Google Translate converts 
news articles written in Spanish 
into English, phrases referring 


to women often become ‘he said’ or ‘he 
wrote. Software designed to warn people 
using Nikon cameras when the person they 
are photographing seems to be blinking 
tends to interpret Asians as always blink- 
ing. Word embedding, a popular algorithm 
used to process and analyse large amounts 
of natural-language data, characterizes 
European American names as pleasant 
and African American ones as unpleasant. 

These are just a few of the many 
examples uncovered so far of artificial 
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intelligence (AI) applications systematically 
discriminating against specific populations. 

Biased decision-making is hardly unique 
to AI, but as many researchers have noted’, 
the growing scope of AI makes it particu- 
larly important to address. Indeed, the 
ubiquitous nature of the problem means 
that we need systematic solutions. Here we 
map out several possible strategies. 


SKEWED DATA 

In both academia and industry, computer 
scientists tend to receive kudos (from publi- 
cations to media coverage) for training ever 
more sophisticated algorithms. Relatively 
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little attention is paid to how data are 
collected, processed and organized. 

A major driver of bias in Al is the training 
data. Most machine-learning tasks are 
trained on large, annotated data sets. Deep 
neural networks for image classification, for 
instance, are often trained on ImageNet, a set 
of more than 14 million labelled images. In 
natural-language processing, standard algo- 
rithms are trained on corpora consisting of 
billions of words. Researchers typically con- 
struct such data sets by scraping websites, 
such as Google Images and Google News, 
using specific query terms, or by aggregat- 
ing easy-to-access information from sources 
such as Wikipedia. These data sets are then 
annotated, often by graduate students or 
through crowdsourcing platforms such as 
Amazon Mechanical Turk. 

Such methods can unintentionally 
produce data that encode gender, ethnic 
and cultural biases. 

Frequently, some groups are over-repre- 
sented and others are under-represented. 
More than 45% of ImageNet data, which 
fuels research in computer vision, comes 
from the United States’, home to only 4% of 
the world’s population. By contrast, China 
and India together contribute just 3% of 
ImageNet data, even though these countries 


Algorithms trained on 
biased data sets often 
recognize only the left- 
hand image as a bride. 


represent 36% of the world’s population. 
This lack of geodiversity partly explains why 
computer vision algorithms label a photo- 
graph of a traditional US bride dressed in 
white as ‘bride; ‘dress, ‘woman, ‘wedding; 
but a photograph ofa North Indian bride as 
‘performance art’ and ‘costume”. 

In medicine, machine-learning 
predictors can be particularly vulnerable to 
biased training sets, because medical data 
are especially costly to produce and label. 
Last year, researchers used deep learning 
to identify skin cancer from photographs. 
They trained their model on a data set of 
129,450 images, 60% of which were scraped 
from Google Images*. But fewer than 5% of 
these images are of dark-skinned individu- 
als, and the algorithm wasn't tested on dark- 
skinned people. Thus the performance of 
the classifier could vary substantially across 
different populations. 

Another source of bias can be traced to 
the algorithms themselves. 

A typical machine-learning program will 
try to maximize overall prediction accuracy 
for the training data. If a specific group of 
individuals appears more frequently than 
others in the training data, the program will 
optimize for those individuals because this 
boosts overall accuracy. Computer scientists 
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evaluate algorithms on ‘test’ data sets, but 
usually these are random sub-samples of 
the original training set and so are likely to 
contain the same biases. 

Flawed algorithms can amplify biases 
through feedback loops. Consider the 
case of statistically trained systems such as 
Google Translate defaulting to the mascu- 
line pronoun. This patterning is driven by 
the ratio of masculine pronouns to femi- 
nine pronouns in English corpora being 
2:1. Worse, each time a translation program 
defaults to ‘he said’, it increases the relative 
frequency of the masculine pronoun on 
the web — potentially reversing hard-won 
advances towards equity’. The ratio of 
masculine to feminine pronouns has fallen 
from 4:1 in the 1960s, thanks to large-scale 
social transformations. 


TIPPING THE BALANCE 

Biases in the data often reflect deep and 
hidden imbalances in institutional infra- 
structures and social power relations. Wiki- 
pedia, for example, seems like a rich and 
diverse data source. But fewer than 18% of 
the site's biographical entries are on women. 
Articles about women link to articles about 
men more often than vice versa, which 
makes men more visible to search engines. 
They also include more mentions of roman- 
tic partners and family”. 

Thus, technical care and social aware- 
ness must be brought to the building of data 
sets for training. Specifically, steps should 
be taken to ensure that such data sets are 
diverse and do not under represent particular 
groups. This means going beyond conveni- 
ent classifications —‘woman/man, ‘black/ 
white, and so on — which fail to capture the 
complexities of gender and ethnic identities. 

Some researchers are already starting to 
work on this (see Nature 558, 357-360; 2018). 
For instance, computer scientists recently 
revealed that commercial facial recogni- 
tion systems misclassify gender much more 
often when presented with darker-skinned 
women compared with lighter-skinned men, 
with an error rate of 35% versus 0.8% (ref. 6). 
To address this, the researchers curated a 
new image data set composed of 1,270 indi- 
viduals, balanced in gender and ethnicity. 
Retraining and fine-tuning existing face- 
classification algorithms using these data 
should improve their accuracy. 

To help identify sources of bias, we 
recommend that annotators systematically 
label the content of training data sets with 
standardized metadata. Several research 
groups are already designing ‘datasheets’ 
that contain metadata and ‘nutrition labels’ 
for machine-learning data sets (http://data- 
nutrition.media.mit.edu/). 

Every training data set should be 
accompanied by information on how the 
data were collected and annotated. If data 
contain information about people, > 
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> then summary statistics on the geog- 
raphy, gender, ethnicity and other demo- 
graphic information should be provided 
(see ‘Image power’). If the data labelling is 
done through crowdsourcing, then basic 
information about the crowd participants 
should be included, alongside the exact 
request or instruction that they were 
given. 

As much as possible, data curators should 
provide the precise definition of descriptors 
tied to the data. For instance, in the case 
of criminal-justice data, appreciating the 
type of ‘crime’ that a model has been trained 
on will clarify how that model should be 
applied and interpreted. 


BUILT-IN FIXES 

Many journals already require authors to 
provide similar types of information on 
experimental data as a prerequisite for 
publication. For instance, Nature asks 
authors to upload all microarray data to the 
open-access repository Gene Expression 
Omnibus — which in turn requires authors 
to submit metadata on the experimental 
protocol. We encourage the organizers 
of machine-learning conferences, such 
as the International Conference on 
Machine Learning, to request standard- 
ized metadata as an essential component 
of the submission and peer-review pro- 
cess. The hosts of data repositories, such as 
OpenML, and AI competition platforms, 
such as Kaggle, should do the same. 

Lastly, computer scientists should strive 
to develop algorithms that are more robust 
to human biases in the data. 

Various approaches are being pursued. 
One involves incorporating constraints 
and essentially nudging the machine- 
learning model to ensure that it achieves 
equitable performance across differ- 
ent subpopulations and between similar 
individuals*. A related approach involves 
changing the learning algorithm to reduce 
its dependence on sensitive attributes, 
such as ethnicity, gender, income — and 
any information that is correlated with 
those characteristics’. 

Such nascent de-biasing approaches are 
promising, but they need to be refined and 
evaluated in the real world. 

An open challenge with these types of 
solutions, however, is that ethnicity, gen- 
der and other relevant information need to 
be accurately recorded. Unless the appro- 
priate categories are captured, it’s difficult 
to know what constraints to impose on the 
model, or what corrections to make. The 
approaches also require algorithm design- 
ers to decide a priori what types of biases 
they want to avoid. 

A complementary approach is to use 
machine learning itself to identify and 
quantify bias in algorithms and data. We 
call this conducting an AI audit, in which 
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IMAGE POWER 


Deep neural networks for image classification 
are often trained on ImageNet. The data set 
comprises more than 14 million labelled 
images, but most come from just a few nations. 


— United States 45.4% 


— Great Britain 7.6% 
— Italy 6.2% 
~l_ Canada 3% 


14 million labelled images ~~ | 


— Other 


the auditor is an algorithm that systemati- 
cally probes the original machine-learning 
model to identify biases in both the model 
and the training data. 

An example of this is our recent work 
using a popular machine-learning method 
called word embedding to quantify his- 
torical stereotypes in the United States. 
Word embedding maps each English word 
to a point in space (a geometric vector) such 
that the distance between vectors captures 
semantic similari- 


ties between cor-  “Rigsesin the 
responding words.  gatq often 
It rs de aera reflect deep 
relations, such a8 and hidden 
ee nes — —- imbalances in 
: ; institutional 
queen. We devel- 

danaloorithm Ufrastructures 
et aT eis, and social power 
— the Al auditor — 3 'P 

relations. 


to query the word 
embedding for 
other gender analogies. This has revealed 
that ‘mar’ is to ‘doctor’ as ‘woman’ is to 
‘nurse, and that ‘mar is to ‘computer pro- 
grammer’ as ‘woman is to ‘homemaker’. 

Once the auditor reveals stereotypes 
in the word embedding and in the origi- 
nal text data, it is possible to reduce bias 
by modifying the locations of the word 
vectors. Moreover, by assessing how ste- 
reotypes have evolved, algorithms that 
are trained on historical texts can be de- 
biased. Embeddings for each decade of 
US text data from Google Books from 
1910 to 1990, reveal, for instance, shock- 
ing and shifting attitudes towards Asian 
Americans. This group goes from being 
described as ‘monstrous’ and ‘barbaric in 
1910 to ‘inhibited’ and ‘sensitive’ in 1990 
— with abrupt transitions after the Second 
World War and the immigration waves of 
the 1980s”. 
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GETTING IT RIGHT 

As computer scientists, ethicists, social scien- 
tists and others strive to improve the fairness 
of data and of AI, all of us need to think about 
appropriate notions of fairness. Should the 
data be representative of the world as it is, or 
of a world that many would aspire to? Like- 
wise, should an AI tool used to assess poten- 
tial candidates for a job evaluate talent, or the 
likelihood that the person will assimilate well 
into the work environment? Who should 
decide which notions of fairness to prioritize? 

To address these questions and evaluate 
the broader impact of training data and 
algorithms, machine-learning research- 
ers must engage with social scientists, and 
experts in the humanities, gender, medicine, 
the environment and law. Various efforts are 
under way to try to foster such collabora- 
tion, including the ‘Human-Centered AP 
initiative that we are involved in at Stanford 
University in California. And this engage- 
ment must begin at the undergraduate level. 
Students should examine the social context 
of Al at the same time as they learn about 
how algorithms work. 

Devices, programs and processes shape 
our attitudes, behaviours and culture. AI 
is transforming economies and societies, 
changing the way we communicate and work 
and reshaping governance and politics. Our 
societies have long endured inequalities. AI 
must not unintentionally sustain or even 
worsen them. = 


James Zou is assistant professor of 
biomedical data science and (by courtesy) 
of computer science and of electrical 
engineering, Stanford University, 
California, USA. Londa Schiebinger is the 
John L. Hinds Professor of History of Science 
and director of Gendered Innovations in 
Science, Health & Medicine, Engineering, 
and Environment, Stanford University, 
California, USA. 

e-mails: jamesz@stanford.edu; 
schieb@stanford.edu 
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ILLUSTRATIONS BY MARCIN WOLSKI AT DEBUT ART 


-— 


When Einst, 
in 
Walked win 
ith Ge 
ge 
Jim Hoy POUSht 


When Einstein 
Walked with 
Godel 


Jim Hott 
Farrar, Straus and Giroux (2018) 


Here, philosopher Jim Holt 
gathers two decades’ worth 
of reviews and essays, writ- 
ten for The New York Review 
of Books, The New Yorker and 
other publications. These are 
bold, thought-provoking pieces 
that tackle everything from the 
physical principle of least action 
to the mathematics of the infi- 
nitely great and infinitesimally 
small. As a physicist, I was 
aware that the zeros of the zeta 
function describe the distribu- 
tion of the prime numbers. The 
related Riemann hypothesis is 
one of the great unsolved maths 
problems, and Holt helped me to 
understand why it’s important. If 
true, there's a hidden harmony 
to the primes; if false, many sup- 
posed proofs of contemporary 
mathematics would fall with it. 

There is some repetition of 
Holt’s obsessions. These include 
the dichotomy between phi- 
losophies of mathematics, and 
the logician Kurt Gédel’s claim 
(during his citizenship hearing) 
that the US Constitution was 
logically inconsistent. Holt has 
revised some pieces. A discus- 
sion of Henry Frankfurt’s On 
Bullshit, dating from the first 
flowering of “truthiness” in the 
administration of president 
George W. Bush, is updated 
with a depressing reference to 
Donald Trump. 

These are stories of real 
humans and their mathemati- 
cal, physical and philosophical 
theories — some of the most 


Health Outcomes 
in a Foreign Land 


BERNARD KwaBI-ADDO 
Springer (2017) 


In any society, many intercon- 
nected factors contribute to 
disparities in chronic non- 
infectious diseases such as 
diabetes. In a country as com- 
plex as the United States, identi- 
fying and understanding those 
interactions is daunting. Cancer 
researcher Bernard Kwabi-Addo 
has made this effort, uncover- 
ing a vast territory of associated 
variables with important 
ramifications for health. 

This is a multilayered, inter- 
disciplinary survey of genetic 
and non-genetic influences on 
health inequities, mainly among 
Americans of African descent. 
It offers a wealth of compara- 
tive statistics on US citizens, 
although based on official racial 
and ethnic categories, which 
are stereotypical and imprecise. 
Intertwined isa literature-based 
social commentary on environ- 
mental contributions to dis- 
crepancies in health outcomes, 
mainly in cardiovascular dis- 
ease, obesity, diabetes, end-stage 
renal disease and hypertension 
and various cancers. 

This is not a deep anthro- 
pological study with original 
solutions or historically contex- 
tualized analyses of the health 
of African Americans. Rather, 
it is a highly readable epide- 
miological synopsis from an 
under-represented viewpoint 
— that ofa recent West African 
immigrant confronting a coun- 
try founded by immigrants. As 
such, it offers a new, potential 
source of insights. 


Discover deep and illuminating 
summer reads picked by our 
regular reviewers, leagues away 
from lab and lecture hall. 


complex ever devised. Fatimah Jackson is professor 
of biology and director of the 

W. Montague Cobb Research 

Laboratory at Howard 


University in Washington DC. 


Andrew Jaffe is a cosmologist 
and head of astrophysics at 
Imperial College London. 
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SEXUAL 
SELECTign 


Darwin and the 
Making of Sexual 
Selection 


EVELLEEN RICHARDS 
Univ. Chicago Press (2017) 


In 1871, Charles Darwin pub- 
lished The Descent of Man, and 
Selection in Relation to Sex. 
Sexual selection solved evo- 
lutionary puzzles that natural 
selection could not. Genera- 
tion after generation, peahens 
chose peacocks with the most 
impressive plumage, producing 
the peacock’s uncamouflage- 
able tail. Darwin believed that, 
in a similar way, sexual selec- 
tion (driven by mates selecting 
according to culturally specific 
beauty standards) unlocked the 
mystery of human diversity’s 
origins. 

It troubled Darwin, a privi- 
leged white Victorian man, to 
impute agency to women and 
aesthetic discrimination to non- 
Europeans. His peers rejected 
the theory. But biologists are 
revisiting it. Science historian 
Evelleen Richards's book vividly 
excavates its origins. 

Darwin developed his ideas 
on sexual selection while 
immersed in fields as diverse as 
embryology and pigeon breed- 
ing. Deeply personal matters 
such as choosing his wife, 
Emma, and daily preoccupa- 
tions such as women’s fashions, 
also played a part. In Richards’s 
view, Darwin’s opposition 
to slavery did not, as others 
argue, motivate his work on 
sexual selection. What did was 
his human attempt to answer 
scientific, political, social and 
personal questions. 


Elizabeth Yale is a lecturer in 
the history department at the 
University of Iowa in Iowa City. 
She is the author of Sociable 
Knowledge. 


Ecology and 
Power in the Age 
of Empire 


Corey Ross 
Oxford Univ. Press (2017) 


More and more people now are 
‘resource omnivores, able to 
draw the stuff of their lives — 
whether smartphones or kiwi 
fruit — from across the globe. 
That fundamental change in the 
human relationship with nature 
is often taken for granted. But 
this chronicle by historian 
Corey Ross challenges us to 
acknowledge the imperial roots 
of modern consumer culture. 

In the nineteenth and twen- 
tieth centuries, Europeans 
transformed much of tropical 
Asia and Africa through vast 
plantations, mining operations 
and infrastructure construc- 
tion to support their expanding 
mass-production economies. 
Ross’s brilliant book shows 
how complex and far-reaching 
that transformation was. The 
exploitation of cotton, cocoa, 
rubber, tin and copper led 
many colonizers to worry about 
environmental degradation. 
It also made ‘development’ 
a nearly universal goal for 
post-colonial societies. 

Ross concludes that the 
inequities of empire add to 
the difficulty of solving con- 
temporary environmental 
problems. So do the habits of 
thought that empire spread, 
especially the idea that the 
world is an inexhaustible store- 
house of resources to be grabbed 
and exploited in the quest for 
ever-greater wealth and power. 


Adam Rome teaches history 
at the University at Buffalo, 
New York. He is writing the 
environmental-history volume 
for Oxford’s Very Short 
Introduction series. 


Chemistry 


WEIKE WANG 
Knop (2017) 


Awareness of the mental-health 
issues affecting many PhD 
students is now widespread. Soa 
novel exploring depression dur- 
ing graduate studies is timely — 
especially one so finely wrought 
as Weike Wang's Chemistry. 

The unnamed Chinese 
American protagonist of this 
debut novel is entrapped by 
parental expectation, a com- 
mitment-obsessed boyfriend 
and a stubbornly unsuccessful 
doctoral project in synthetic 
organic chemistry. She doesn't 
seem to realize how badly she 
wants out of all three until the 
day she finds herself hurling 
glassware onto the lab floor. 
The novel investigates how she 
got to that point psychologi- 
cally, and details her attempts 
to extract herself to find a better 
life beyond. 

Despite the bleak topic, 
Wang’s anonymous protago- 
nist is luminously funny. There 
wasn't a vignette where I didn't 
find myself laughing out loud 
at the gallows humour — or 
ruefully shaking my head at 
thorny issues that all scientists 
will recognize. Those range 
from the mundane (non-stop 
experimental failure) to the 
extreme (lab heads who drive 
their students so ruthlessly 
that they resort to fraud, or 
even suicide). When a student 
tragically kills himself in the 
novel, he is described as “con- 
siderate” because he left his 
flatmates a note on his body: 
“Danger: potassium cyanide. 
Do not resuscitate.” 


Jennifer Rohn is a cell biologist 
at University College London, 
editor of the webzine LabLit. 
com, and writer. Her latest 
novel is Cat Zero. 
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Tell Me the 
Planets 


BEN PLaTTS-MILLS 
Fig Tree (2018) 


Confabulation, ataxia, 
dysarthria, dysphagia, hemi- 
paresis: the mesmerizing 
names of neurological condi- 
tions mask cruelties. But in 
Ben Platts-Mills’s extraordin- 
ary book, the characters of 
people damaged by violence, 
stroke or accident of birth 
outshine the medical details. 

The narrative begins in 
a London charity called 
Headway. Through its doors 
come individuals who have 
maintained their identity, 
however uncertainly. Danny is 
half paralysed after a gangland 
assault; Liah, born infected 
with HIV, has been left almost 
helpless by a brain-damaging 
condition diagnosed too late; 
ambitious computer program- 
mer Matthew found his life 
redirected when a colloidal- 
cyst operation led to memory 
impairment. 

The three are an ex-convict 
Londoner, an Eritrean refu- 
gee and a Nigerian economic 
migrant. But what really 
defines these individuals, 
even as memory fails and 
words elude them, is their 
stubborn vitality, their aware- 
ness of their own condition as 
they face bureaucracy, social 
and medical. They know, and 
they want us to know. 

“How do you tell a story 
about so much loss, about 
disability, about catastrophic 
misfortune ...?” Platts-Mills 
asks. He tells it wonderfully. 


Tim Radford was until 2005 
science editor of The Guardian. 
His book The Consolations 

of Physics (or, The Solace of 
Quantum) will be published 

in August. 


19 JULY 2018 | VOL 559 | NATURE | 329 


© 2018 Springer Nature Limited. All rights reserved. 


BOOKS & ARTS 


Pain, Pleasure, 
and the Greater 
Good 


CaTHY GERE 
Univ. Chicago Press (2017) 


In this thoroughly gripping 
science history of utilitarian- 
ism, Cathy Gere charts the 
trajectory of the ethical theory, 
which hinges on the ‘greatest 
good for the greatest number’. 
For 200 years, utilitarianism 
pervaded much research in 
medicine and psychology: 
pain inflicted on individuals 
was seen as providing broader 
gain and a platform for social 
policy. Reducing aid to the 
poor, for instance, was under- 
stood to ‘save’ society from 
the indolence this supposedly 
encouraged. 

Gere’s engrossing narra- 
tive takes us up to the 1973 
hearings on the notorious 
Tuskegee Syphilis Study. For 
four decades, the US Public 
Health Service had observed 
the progression of the disease 
in hundreds of impoverished 
African American men, who 
were neither told they carried 
it nor given treatment. Medi- 
cal claims of greater good were 
brought crashing down. Yet 
the study’s ethos resurfaces 
in behavioural economics, 
through nudges that, without 
consent, shape the many in the 
mould of the few — supposedly 
‘saving us from some inher- 
ent irrationality. Gere rightly 
emphasizes that we should be 
wary of ‘noble’ ends justifying 
any means. 


Alex Haslam is professor of 
psychology and Australian 
Laureate Fellow at the 
University of Queensland in 
Brisbane. His most recent book, 
which he co-authored, is The 
New Psychology of Health. 


Adaptive 
Markets 


ANDREW Lo 
Princeton Univ. Press (2017) 


The idea that financial markets 
are in any way rational or 
efficient seems, to many, absurd 
— not least as we mark the tenth 
anniversary of the 2008 crisis. 
Yet in economics, Eugene Fama’s 
‘efficient-market hypothesis’ 
has a stubborn grip. This holds 
that prices of financial assets 
incorporate all available infor- 
mation, so that those assets will 
always trade at their objectively 
justified value. You can see the 
point: if there were unexploited 
information, somebody would 
trade on it and thus incorporate 
it into the market price. 

In this study, economist 
Andrew Lo proposes an alter- 
native: the adaptive-market 
hypothesis. In stable times, he 
argues, the efficient-market 
insight holds; but when, say, 
a shock bank collapse occurs, 
other kinds of investor behav- 
iour, such as panic selling, can 
kick in. Lo draws extensively on 
neuroscience, psychology and 
evolutionary biology to develop 
his ambitious theory. 

As he acknowledges, there 
is a lot still to understand. But 
it’s a fascinating read, cogently 
situating financial behaviour 
within what we know about 
human behaviour and evolu- 
tionary history. What’s more, 
Lo concludes, his theory can 
inform the regulation of finan- 
cial markets so that they serve 
society by funnelling investment 
to tackle big problems such as 
climate change — rather than 
the socially destructive short- 
term trading that led to the 2008 
meltdown. 


Diane Coyle is Bennett 
Professor of Public Policy at the 
University of Cambridge, UK. 
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Trump cannot crush 
Iran’s scientists 


US President Donald Trump’s 
unilateral withdrawal from 
the Iran Nuclear Agreement 
(Joint Comprehensive Plan of 
Action; JCPOA) in May attracted 
international condemnation. 
As vice-dean for research in the 
Faculty of Medicine at Tehran 
University of Medical Sciences, 
I stand behind Iran’s scientists, 
who have resolved to work even 
harder to maintain the country’s 
scientific progress (see also 
Nature 557, 287-288; 2018). 
After the imposed war in 
1980-88 and decades of Western 
sanctions, Iran has made 
remarkable advances in research, 
ranking 17th in the world in 
2012. The JCPOA did not have 
much impact on scientific 
productivity, in part because 
many US sanctions remained in 
place. These continued to affect 
the purchase of books, journals, 
lab equipment and materials; the 
payment of publication charges; 
membership of scientific bodies; 
and travel to conferences and 
meetings. Furthermore, the US 
treasury department clamped 
down on publication in US 
journals of papers from Iranian 
government scientists (see 
S. Akhondzadeh Avicenna 
J. Med. Biotechnol. 5, 203; 2013). 
In the face of Trump's 
withdrawal from the JCPOA, 
Thope that the international 
scientific community will support 
Iran’s efforts to contribute further 
to international science. 
Shahin Akhondzadeh Tehran 
University of Medical Sciences, 
Tran. 
s.akhond@tums.ac.ir 


Exit interviews and 
lab-member awards 


As leader of a large research 
group, I would like to share 
an effective strategy for 
collecting negative feedback and 
constructive suggestions from 
lab members on leadership issues 
(see Nature 557, 294-296; 2018). 
Following the practice of 


many commercial companies, I 
organize an exit interview with 
every postdoc, graduate and 
undergraduate student when 
they leave the lab. I find that 
people are generally more open 
about problems when they are 
leaving, because they no longer 
have to worry about reactions 
from their seniors or colleagues. 
Identifying likes and dislikes 
from a variety of viewpoints helps 
me to reinforce good practices 
and modify unwelcome ones. 

Another industrial ploy 
I use is to run semi-annual 
votes for the best lab member, 
along the lines of company 
awards for ‘employee of the 
month: Lab members vote on 
three performance criteria: 
helpfulness, work ethic and 
productivity. The person who 
obtains the highest collective 
score from their peers is treated 
to a free lunch. 

Although the winners value 
their peers’ respect over a free 
lunch, the award helps the lab 
establish a culture of helping one 
another, working hard and with 
integrity, and honing scientific 
findings for publication. 

Z. Hugh Fan University of 
Florida, Gainesville, Florida, 
USA. 

e-mail: hfan@ufl.edu 


Evaluation woes: we 
saw it coming 


The cry of anguish from John 
Tregoning asking how his 
research should be judged, if 
not by the journal impact factor 
(Nature 558, 345; 2018), reflects 
a profound malaise in the 
university system. So what did we 
do before journal impact factors 
were invented, when career 
advancement flourished anyway? 
The transition from traditional 
rigorous intellectual assessment 
of research to bibliometric 
indices and box-ticking 
coincided with the transition to 
the corporate university model 
and the rise of the university 
bureaucrat. These administrators 
showed less interest in assessing 
the intellectual merit of research 


than in deploying competitive 
metrics for the marketplace. 

Governments are much 
to blame because of their 
decreasing budgets for tertiary 
education. However, the 
professoriate (to which I belong) 
should have seen the danger 
these shifts posed sooner and, 
when it did, it should have fought 
harder for the intellectual heart 
of the system. 

Some evidence-based metrics 
are useful. In my view, however, 
a return to the methods of peer- 
driven intellectual assessment 
that worked well for centuries 
should remain part of the 
answer to evaluation woes — 
even though that could mean 
retrieving the system from the 
grasp of university bureaucrats 
and the burgeoning bibliometric 
industry. 

Andrew Beattie Macquarie 
University, Sydney, Australia. 
andrew. beattie@mq.edu.au 


Evaluation woes: 
start right 


In our view, we need to move 
froma single system for assessing 
research performance (see 

J. Tregoning Nature 558, 345; 
2018) to a prospective model 
implemented at the start ofa 
research initiative. This would 
engage stakeholders in defining 
metrics for the project's mission 
and agenda. 

An example is the European 
Commission’s MULTI-ACT 
project, which is a collective 
research-impact framework of 
multivariate models for health 
research and innovation (see 
go.nature.com/2mdkqgt). This 
integrates conventional metrics 
related to excellence with new 
measures relating to economic 
and financial efficiency and to 
social efficacy. 

Although not the “quick fix” 
Tregoning mentions, such 
multidimensional measures 
should help early-career 
researchers to tie their work 
more effectively to a meaningful 
research agenda. 
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Evaluation woes: 
metrics beat bias 


We disagree with the contention 
that publication metrics should 
be condemned as the bane of 
research-evaluation practices 
(see J. Tregoning Nature 558, 
345; 2018). In countries with 

a long-rooted tradition of 
nepotism and patronage, such 
metrics provide objective 

and consistent evaluation — 
particularly advantageous for 
early-career researchers. They 
can also help overstretched 
funding agencies and review 
panels to arrive at fast, fair and 
transparent decisions. 

The conventional combination 
of qualitative review and 
quantitative metrics can be 
expensive and time-consuming, 
not least because it is hard to find 
genuinely impartial reviewers 
and to achieve consensus. 

We acknowledge that misuse 
of metrics such as journal 
impact factors and citation 
counts can discredit creative 
research, encourage citation 
gaming and provoke research 
misconduct. But the striking 
increase in the popularity of 
metrics as an evaluation tool 
worldwide indicates that they 
offer benefits, too. 
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IMAGING TECHNIQUES 


A record-breaking microscope 


An electron microscope has been developed that produces images at higher resolution than conventional approaches can 
achieve, and is suitable for studying fragile materials that can be damaged by electron beams. SEE ARTICLE P.343 


JOHN RODENBURG 


n page 343, Jiang et al.’ report the 
() highest-magnification image ever 

obtained using a transmission electron 
microscope. The image reveals the atoms in a 
two-dimensional self-supporting sheet of a 
semiconductor, and has a resolution of 0.39 ang- 
stréms; for comparison, most atoms are about 
2-4 A in diameter. The technique might even- 
tually allow 2D materials to be examined with 
unprecedented precision, providing insight into 
this burgeoning class of useful compounds. It 
might also lead to the development of a method 
that can image individual atoms in 3D objects. 

To generate their image, the authors used a 
method called ptychography (the ‘p is silent) in 
which radiation — in this case, electrons — is 
passed through a specimen to produce many 
2D diffraction patterns. The basic principle of 
the technique was proposed almost 50 years ago 
by the physicist Walter Hoppe, who reasoned 
that there should be enough information in the 
diffraction data to work backwards to produce 
an image of the diffracting object”. However, 
it was many years before computer algorithms 
were developed that could do this reverse cal- 
culation easily and reliably*“. The pictures pro- 
duced by ptychographic methods are generated 
using a computer from a vast amount of indi- 
rect scattering data. An important advantage of 
this approach over conventional microscopy is 
that it can surpass the resolution limit imposed 
by lens imaging. In fact, it can work without any 
lenses at all’, 

Over the past ten years, ptychography has 
been widely used for microscopy in the X-ray*, 
extreme ultraviolet® and visible-light”* regions 
of the electromagnetic spectrum. It has also 
been used with some success with electrons, but 
Jiang and colleagues are the first to show that it 
can surpass the resolution obtained by the best 
electron lenses. For the technique to work, every 
electron must be counted almost perfectly, but 
the scattering patterns contain both extremely 
bright and extremely dark regions (they are said 
to have a high dynamic range), which makes 
it difficult to count every electron without 
any errors. To complicate matters further, the 
experiment must be done as quickly as possible 
to fulfil other experimental constraints, which 
means that about 1,000 diffraction patterns 
must be recorded every second. 
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Figure 1 | Improving the resolution of electron microscopy. a, This image ofa sheet of molybdenum 
disulfide was obtained using annular dark-field electron microscopy — the conventional method for 
obtaining extremely high-resolution images of samples. b, Jiang et al.' report an electron microscope 

that works by analysing the diffraction patterns of electrons that have been transmitted through a sample 

(a technique known as ptychography). This method provides the best resolution yet reported for an electron 
microscope. Here, the atoms in the sheet of molybendum disulfide are much clearer. Scale bar, 3 angstréms. 


Speed, accuracy and dynamic range are 
conflicting performance parameters in any 
electron detector — achieving them all is dif- 
ficult. All previous electron detectors have 
compromised on one or more of these prop- 
erties. The authors’ main achievement is to 
implement a detector that can handle such 
demanding specifications. 

Remarkably, Jiang et al. gave themselves a 
huge handicap with regard to beating the resolu- 
tion record. For any given microscope lens, the 
best resolution is achieved by using the shortest 
possible wavelength of the radiation or elec- 
tron beams concerned. However, the authors 
used relatively low-energy electrons, which 
have twice the wavelength of those used in the 
highest-resolution lens-based microscopes”. 
Using low-energy electrons for microscopy 
is good because it greatly reduces the damage 
inflicted on the specimen by the electrons. But 
in this case, it also meant that the resolution 
of the lens used by Jiang and colleagues was 
reduced by a factor of two. To beat the resolution 
record, the authors had to process a particular 
subset of the ptychographic diffraction data (the 
high-angle data), thereby obtaining an image 
with a resolution 2.5 times better than would 
otherwise have been possible. 

Achieving high resolution is not the whole 
story, however. Anyone with poor eyesight 
knows that they need as much light as possible 
if they want to read small print. This is because 
there is an intimate relationship between 
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resolution, contrast and the amount of light 
illuminating the object. If the small print is light 
grey, not black, then the contrast is low, and 
even more light is needed to read it. The same 
principle is true for an electron microscope. 

Jiang and colleagues used ptychography to 
work out how a particular property of the elec- 
tron waves, known as the phase, changes as the 
waves pass through an object. This informa- 
tion can be used to produce images that have 
strong contrast — even for specimens that 
contain atoms of low atomic number, which are 
difficult to detect with conventional electron- 
microscopy methods that offer very high reso- 
lution. The authors therefore needed relatively 
few electrons to generate their images com- 
pared with other state-of-the-art techniques, 
such as annular dark-field electron microscopy 
(Fig. 1). So not only did they use low-energy 
electrons without compromising resolution, 
but they also used many fewer electrons than 
other techniques do, further reducing the dam- 
age done to the sample. 

Perhaps the most striking feature of Jiang 
and co-workers’ image is not the atoms them- 
selves but the enormous gaps between them. 
The average bond lengths in a material can be 
measured in a bulk sample by using all sorts 
of diffraction and spectroscopic methods, 
but the authors’ image provides an extremely 
precise measurement of the lengths of the 
bonds between individual pairs of atoms, 
which are sensitive to the atoms’ local bonding 


environment. But are ultrahigh-resolution 
images of gaps between atoms useful for any- 
thing else? 

I think the answer lies in the big success 
story of X-ray ptychography: tomography’, 
a technique in which lots of 2D images of a 
transparent object are acquired as it is rotated, 
so that a 3D image can be built up. Phase infor- 
mation is an ideal imaging signal for this tech- 
nique. But when images are taken through a 
solid object, the resolution needs to be as high 
as possible to distinguish features lying on the 
top surface from those at the bottom, many of 
which will seem (when seen in projection) to 
be laterally close to one another. 

Jiang et al. tested the resolution of their 
electron microscope by putting two layers 
of atoms on top of one another and measur- 
ing the minimum apparent lateral distance 
between atoms in different layers, some of 
which were almost overlapping. In my view, 
this test demonstrates that their instrument 
could potentially be used for tomography. In 


CARDIOVASCULAR BIOLOGY 


theory, such imaging of multiple layers is not 
limited to crystalline 2D materials and could 
be used for any complicated, non-crystalline 
structure. Unfortunately, for thicker objects, the 
electron waves would scatter so strongly that 
they would spread out and re-interfere with 
each other in complicated ways, which would 
make it even harder — although in theory not 
impossible — to work out the structure. 
Perhaps the take-home message of this 
work is not so much the record resolution, or 
its applications to 2D materials, but the fact 
that it will provide a way of precisely imaging 
the 3D bonding of every individual atom in a 
solid volume of matter, while using a minimal 
flux of damaging electrons. Indeed, the authors 
allude to this enticing possibility in their con- 
clusions, suggesting that the next step is to use 
their remarkable detector for tomography. 
The aim would then be to solve the exact 3D 
atomic structures of solids that have no long- 
range order, such as nanocrystalline materials, 
glasses and amorphous metals, for which we 


Cells stop dividing 
to become arteries 


An analysis of gene-expression patterns in single cells provides detailed insights 
into the developmental processes that lead to maturation of the coronary 


arteries. SEE ARTICLE P.356 


ARNDT F. SIEKMANN 


he human heart pumps between about 

5 and 20 litres of blood through the 

body every minute’. To receive enough 
oxygen to fulfil this tremendous task, heart- 
muscle cells need their own blood supply. 
This is provided by specialized blood vessels, 
including coronary arteries. Defects in these 
arteries can lead to coronary heart disease and 
even heart attack”*. Understanding how coro- 
nary arteries form during embryonic develop- 
ment is therefore of great interest, because such 
knowledge might help in developing strategies 
to prevent or treat coronary heart disease. On 
page 356, Suet al.’ provide a detailed picture of 
the sequence of events that leads to coronary 
artery development. 

The cells that generate coronary arteries 
originate from various regions of the embryo, 
including a sac-like structure called the sinus 
venosus that adjoins the embryonic heart”®. 
From these sites, the cells invade the heart's 
muscle-cell layer. Here, they form an imma- 
ture blood-vessel network called a plexus that 
is subsequently remodelled into functional 
arteries and veins. 

Su and colleagues set out to investigate 


how cells from the sinus venosus develop 
into coronary arteries, using single-cell RNA 
sequencing (scRNA-seq) — a technique that 
enables precise identification of the genes 
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must currently infer structures from averaged 
bulk measurements. m 
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being expressed in each cell of a tissue’. Gene- 
expression patterns change during tissue dif- 
ferentiation, for example as sinus venosus cells 
mature into coronary arteries. Comparison of 
the gene-expression patterns for individual 
cells of a given type can therefore reveal the 
cells’ relationships to one another. 

The authors extracted single endothelial 
cells, which make up the inner lining of blood 
vessels, from the hearts of mouse embryos at a 
developmental time point just before coronary 
artery formation. They reasoned that, at this 
embryonic stage, they would obtain cells at the 
various stages leading to coronary artery matu- 
ration, including sinus venosus and plexus cells. 
They then used bioinformatics to investigate 
the lineage relationships between these cells. 

It has been thought that the remodelling of 


Plexus 


COUP-TF2 —4 


Pre-artery Capillary Vein 


Artery 


Figure 1 | Coronary artery development starts early. a, During the development of mouse embryos, 
cells from a sac-like structure called the sinus venosus migrate into the muscle-cell layer of the heart. 

b, There, they give rise to an immature blood-vessel network (a plexus), which will be remodelled to form 
arteries, veins and capillaries. Su et al.’ have shown that a subpopulation of immature plexus cells, which 
the authors dub pre-artery cells, have a gene-expression profile that is characteristic of mature arteries. 
The transcription factor COUP-TF2 prevents plexus cells from adopting this profile. Pre-artery cells 
predominantly give rise to mature coronary artery cells, although a few become part of capillaries instead. 


(Figure adapted from Fig. 4h of ref. 4.) 
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the plexus into arteries and veins starts only 
after the plexus has connected to the aorta 
(the main heart artery), and therefore after 
the onset of blood flow’. But, unexpectedly, 
Su et al. found that several cells from their 
embryos, in which the plexus had not yet 
received blood, had a gene-expression profile 
associated with mature arteries. They called 
these cells pre-artery cells. 

The authors used a genetic strategy to 
indelibly label the pre-artery cells with a 
marker protein, such that these cells and the 
lineages they give rise to could be tracked 
during embryonic development. This 
lineage tracing revealed that, although most 
pre-artery cells did go on to form coronary 
arteries, some were incorporated into capil- 
laries, which connect coronary arteries with 
veins. Thus, it seems that, although certain 
endothelial cells are genetically predisposed 
to form arteries, they also have a degree of 
developmental plasticity (Fig. 1). 

Next, Su and colleagues performed a 
detailed analysis of the gene-expression 
patterns of cells on the developmental spec- 
trum from sinus venosus to pre-artery cells. 
Changes in gene expression towards more 
arterial-like profiles occurred only gradually 
along most of the spectrum. However, there 
was a sharp change as cells crossed a threshold 
to adopt a pre-artery state. The researchers 
showed that the greatest difference in expres- 
sion in pre-artery cells compared with other 
cells in their analysis occurred in genes impli- 
cated in regulating the cell cycle. Furthermore, 
in mouse embryos, pre-artery cells proliferated 
less than did cells in the plexus. Thus, limit- 
ing cell divisions might be a prerequisite for 
coronary artery maturation. 

Indeed, the authors found that overexpres- 
sion of the transcription factor COUP-TF2 
in mice inhibited pre-artery formation by 
upregulating cell-cycle genes. COUP-TF2 
was previously thought to limit the growth of 
arteries by suppressing the Notch signalling 
pathway’. But Su et al. showed that activa- 
tion of Notch signalling could not prevent the 
defects caused by COUP-TE2 overexpression 
in mouse embryos. By contrast, pharmacologi- 
cal inhibition of the cell cycle increased artery 
formation in an ex vivo experiment. Thus, 
COUP-TEF2 has functions in artery develop- 
ment that are independent of Notch signalling. 
Together, Su and colleagues’ work provides 
exciting insights into coronary artery forma- 
tion. It will be interesting to discover whether 
the findings apply to artery development in 
other settings. 

It will also be valuable to delineate the 
signalling pathways that lead certain cells 
to adopt the gene-expression profile of pre- 
artery cells. The Notch signalling pathway, 
acting independently of COUP-TF2, is a 
prime candidate. Studies in several develop- 
mental settings” '' have suggested that new 
blood-vessel sprouts initially emanate from 
veins and capillaries and only subsequently 
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form arteries, with each activity inhibiting the 
other. Inhibiting Notch signalling can lead to 
excessive blood-vessel sprouting, while at the 
same time preventing artery formation’*”’, 
and a similar effect is seen during coronary 
artery development”. It will therefore be 
interesting to investigate how Notch signal- 
ling affects the gene-expression profiles that 
lead to the formation of pre-artery cells. 

Although it is becoming increasingly clear 
that artery differentiation is intimately linked to 
cell-cycle state, the underpinnings of this rela- 
tionship need further investigation. A report 
last year showed that cell-cycle inhibition is 
important for proper arterial gene expression in 
cells in vitro’’. In addition, the signalling path- 
ways involving Notch and vascular endothelial 
growth factor, which are indispensable for the 
establishment of new blood-vessel networks, 
are both implicated in influencing endothelial- 
cell proliferation’®. However, it is not known 
how cells interpret signalling inputs from these 
pathways to balance the demand for prolifera- 
tion of cells in the blood-vessel network with 
the need to establish new arteries. 

One must also bear in mind that cell-lineage 
trajectories obtained from scRNA-seq might 
not reflect true developmental relation- 
ships. For instance, cells that have similar 
gene-expression patterns are not necessarily 
derived from the same precursor population. 
New techniques that unite cell-lineage tracing 


with sCRNA-seq”’ will help to bridge this gap, 
and will surely provide further insights into 
coronary artery development. m 
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Newfound differences 
between great apes 


High-quality genome sequences for some of the great apes have been assembled 
using state-of-the-art sequencing tools. The assemblies provide an unbiased 
comparison between humans and their closest evolutionary relatives. 


AYLWYN SCALLY 


uch of evolutionary biology is 
Miesvate by the principle that 

you cannot understand one species 
without comparing it with another. When 
nineteenth-century naturalists compared 
the anatomies of humans and other apes, it 
became clear that these species shared many 
features and had evolved from a common 
ancestor. More recently, developments in 
DNA sequencing — which enabled assembly 
of the human genome’ in 2001, followed by 
lower-quality ‘draft’ genomes for other great 
apes” * — have transformed our understand- 
ing of this evolutionary process. Writing 
in Science, Kronenberg et al.’ describe new 
great-ape genome assemblies, generated using 
a technology that surpasses previous methods. 
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This work marks a new stage in our ability to 
study and compare these species. 

Genome assembly is often likened to piecing 
together a jigsaw puzzle — a huge jigsaw for 
which the box has been lost and we have only a 
vague idea of what the whole should look like. 
The analogy holds because sequencing technol- 
ogies cannot sequence an entire chromosome in 
one go. Instead, they fragment the genome into 
many separate pieces, called reads, which have 
to be matched, overlapped and placed together. 

Previous generations of sequencing 
machines produced reads that were only about 
a hundred base pairs long, or perhaps a thou- 
sand base pairs but at exorbitant cost. Current 
machines such as Pacific BioScience’s single- 
molecule real-time (PacBio SMRT) sequencer 
produce reads tens of thousands of base pairs in 
length. Even with this improvement, hundreds 
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Figure 1 | A Sumatran orangutan and her baby. Kronenberg et al.’ assembled high-quality genome sequences for a chimpanzee and an orangutan, and 
compared these with the human genome to look for evolutionary differences. 


of thousands of reads are needed to span a 
genome of three billion base pairs such as that 
of humans, Moreover, in practice, a large excess 
is used (typically more than 30 genomes’ worth) 
to mitigate errors and resolve overlap ambigui- 
ties. A further complication arises from the fact 
that genomes are filled with stretches of DNA 
in which the same pattern is repeated many 
times, either in series or scattered throughout 
the genome. In apes, such repetitive DNA com- 
prises a substantial fraction of the genome. 
Because of these difficulties, the first great- 
ape genome projects used the human genome 
as a scaffold to help assemble genomic 
regions that are structurally similar to those 
of humans — that is, in which correspond- 
ing stretches of DNA lie in the same order 
and are present in a similar number of copies. 
This strategy enabled better assembly in such 
regions. But in regions where genome struc- 
ture has evolved very differently in humans 
and other great apes, the great-ape draft assem- 
blies tended to be more fragmented, and the 
resulting variation in assembly quality effec- 
tively constituted a bias towards the human 
genome. These assemblies provided many 
evolutionary insights, but there has nonethe- 
less been a deficit in our understanding of the 


genomic elements that make humans unique. 
One reason why structural variation is 
important, particularly on the short evolution- 
ary timescale that separates humans and other 
great apes, is that it provides a way for genomes 
to evolve rapidly. When a whole chunk of 
DNA is removed or duplicated, its molecular 
function can be inhibited or enhanced in one 
step, rather than through successive muta- 
tions at individual bases. Indeed, much of the 
great-ape genome 
seems to be modu- 
lar in nature, and is 
therefore susceptible 
to the kind of build- 
ing-block alteration 
that structural varia- 
tion allows. It is also 
thought that gene loss is a key mechanism for 
evolutionary change’. This might seem coun- 
terintuitive, but genes often act to constrain, 
rather than promote, a particular function. 
Disabling them by removing, duplicating or 
relocating a chunk of DNA might be the sim- 
plest way to confer beneficial effects. 
Kronenberg et al. used PacBio SMRT 
to assemble high-quality genomes for a 
chimpanzee and an orangutan, along with two 
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human genomes for comparison (Fig. 1). The 
long reads enabled them to do away with the 
human-genome scaffold used previously, and 
to increase the typical distance between gaps by 
about 100-fold compared with previous assem- 
blies. The authors found about 600,000 struc- 
tural differences between these genomes and 
that of humans, including more than 17,000 
differences specific to humans. Of these, many 
changes disrupt genes in humans that are not 
disrupted in other apes. Genes whose activity 
is suppressed specifically in humans are more 
likely than other genes to be associated with a 
human-specific structural variant. 

Many genes produce multiple versions, 
called isoforms, of the protein they encode, 
each of which can have a different role. 
Kronenberg and colleagues found evidence 
that one human-specific structural change — 
a large deletion in the gene FADS2 — might 
have altered the distribution of isoforms the 
gene produces. These isoforms are involved 
in the synthesis of fatty acids needed for 
brain development and immune response’, 
and are difficult to obtain from a purely her- 
bivorous diet. Correspondingly, FADS2 has 
been a target for natural selection associ- 
ated with dietary changes towards or away 
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from animal fats in recent human evolution’. 
Chimpanzees eat a small amount of meat, so 
it is not known what (if any) human-specific 
traits might have resulted from this deletion, 
but it does suggest that shifting dietary pat- 
terns could have been a feature of human evo- 
lution over long timescales. 

Structural variation also seems to have 
had a role in brain evolution. Human brains 
are much larger than those of other apes, 
and it is plausible that genes involved in 
brain growth and development were key to 
the evolution of this trait. The authors ana- 
lysed the sequences of genes that are active 
in radial glial cells, which are progenitors for 
neurons and other cells in the brain’s cortex, 
and compared protein production by these 
genes in humans and chimpanzees using cor- 
tical organoids — 3D models of brain tissue 
grown in vitro. These analyses revealed that 
41% of genes whose activity is suppressed in 
human radial glial cells are associated with a 
human-specific structural variant. Again, this 
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is consistent with structural genomic changes 
causing disruption or loss of gene function 
during great-ape evolution. 

Intriguing as Kronenberg and colleagues’ 
findings are, there is also a broader significance 
to their work. Several groups and consortia 
are applying new sequencing technologies 
to different organisms. Ultimately, research- 
ers want accurate, high-resolution assemblies 
for all species, and to compare these genomes 
on an equal footing. This will improve evolu- 
tionary analyses and reveal complex mutation 
processes that have hitherto been obscured. 
Large genome assembly currently remains 
hugely expensive, and even state-of-the-art 
sequencing tools struggle to resolve repetitive 
sequences on scales above a few hundred thou- 
sand base pairs, making assembly of certain 
genomes challenging. But tools to read whole 
genomes with negligible errors on inexpen- 
sive hardware are not far away, and are almost 
available for small bacterial genomes’. 

It is clear that we are leaving behind the 


Immune link to failure 
of cancer treatment 


Prostate-cancer treatment usually fails after time as resistance to therapy 
develops. It emerges from studies of mice and human cells that a population of 
immune cells can cause this type of treatment resistance. SEE ARTICLE P.363 


MATTHEW D. GALSKY 


rostate cancer causes more than 
P 300,000 deaths annually worldwide and 
is one of the most common causes of can- 
cer-linked mortality in men’. In 1941, the dem- 
onstration’ that the condition regressed after 
surgical castration established a link between 
prostate-cancer growth and androgens — the 
hormones, such as testosterone, that are mainly 
generated in the testes and aid the development 
of male characteristics. The current standard 
treatment for advanced-stage prostate cancer 
is androgen depletion by chemical means. 
However, this almost invariably provides only 
a temporary halt to the disease. When cancer 
progression resumes despite low androgen lev- 
els, the condition is known as castration-resist- 
ant prostate cancer. On page 363, Calcinotto et 
al.’ report that the action of immune cells can 
drive this type of treatment resistance. This dis- 
covery could pave the way to new therapeutic 
options, and illuminates our understanding of 
the spectrum of interactions in the prostate- 
cancer microenvironment. 
The androgen receptor is a protein that 
can regulate gene expression. Androgen- 
deprivation therapy can lead to prostate-cancer 
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regression because the absence of androgen- 
mediated signalling causes cancer cells to die 
or cease dividing*”. The resulting decrease in 
tumour size is often monitored in the clinic by 
tracking a decline in the level ofa protein called 
prostate-specific antigen (PSA). Although it 
was originally thought that castration-resistant 
prostate cancer arises through mechanisms that 
are independent of androgen-mediated signal- 
ling, certain observations have challenged that. 
PSA expression is regulated by the androgen 
pathway, and an increase in the level of PSA 
almost always accompanies the development of 
castration-resistant disease®. Moreover, clinical 
improvement can occur when people under- 
going androgen-deprivation therapy are also 
given extra treatments that hamper androgen 
signalling’. 

A range of mechanisms underlying 
castration-resistant prostate cancer have been 
reported’, and several causes probably con- 
tribute to disease progression in any given 
individual. The identification of mechanisms 
associated with disease progression has led 
to the development of associated treatments. 
For example, up to half of castration-resist- 
ant prostate cancers’ are accompanied by 
an increase in the number of copies of the 
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initial period of evolutionary genomics, in 
which analyses involved comparing a genome 
of interest to a few ‘gold standard’ genomes, 
such as human, mouse or zebrafish. Instead, 
we are moving towards a more complete and 
equable genomic view of life. m 
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androgen-receptor gene. This, along with 
other clues suggesting that androgen-receptor 
expression correlates with disease progression, 
has stimulated the generation of drugs that can 
inhibit the androgen receptor’’. Castration 
resistance can also occur through an increase 
in the expression of enzymes that synthesize 
androgens, and this discovery led to the devel- 
opment of drugs that inhibit androgen bio- 
synthesis''. However, resistance to both these 
types of inhibitor can develop, so there is still a 
need for additional clinical strategies. 

Calcinotto and colleagues investigated 
whether immune cells might aid the develop- 
ment of castration-resistant prostate cancer. 
They focused on an immune-cell population 
known as myeloid-derived suppressor cells 
(MDSCs), which includes monocytes and neu- 
trophils that might be in an immature state of 
abnormal activation. MDSC presence is linked 
to poor prognosis for patients who have pros- 
tate cancer”, although this connection has been 
attributed to the suppression of an anticancer 
immune response. Calcinotto et al. observed 
that tumour biopsies from people who have 
developed castration-resistant prostate cancer 
contain more MDSCs that express the proteins 
CD11b, CD33 and CD15 than do samples from 
people whose prostate cancer has not pro- 
gressed to the castration-resistant stage. 

The authors investigated whether MDSCs 
might contribute directly to castration resist- 
ance, using human cell samples and mouse 
models of prostate cancer. In mice, the authors 
found that surgical castration resulted in 
an increase in the recruitment of MDSCs 
to tumours, compared with the recruit- 
ment observed in control animals given a 
mock operation. Calcinotto and colleagues 
grew mouse MDSCs in vitro and isolated 
samples of the culture medium. When this 
medium was added to androgen-dependent 
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Figure 1 | An immune cell drives treatment resistance in prostate cancer. 
Calcinotto et al.* used mouse models and human clinical samples to 
investigate how prostate cancer evades therapy. a, If male hormones called 
androgens bind to the androgen receptor (AR), this can drive the expression 
of genes that promote prostate-cancer growth. b, A standard treatment 

for the disease is chemical castration, in which drugs are used to decrease 
androgen levels. However, the subsequent slowing of tumour progression 

is not permanent. c, When tumour growth returns, the condition is called 


prostate-cancer cell lines cultured in vitro 
under androgen deprivation, it sustained 
the proliferation and survival of the cells 
and caused an increase in the transcription 
of genes whose expression is driven by the 
androgen receptor. 

The authors carried out equivalent experi- 
ments using human cells and made similar 
findings. Furthermore, the use of pharmaco- 
logical techniques to deplete MDSCs delayed 
the emergence of castration resistance in mice. 
Together, these results suggest that MDSCs 
secrete a factor that drives the emergence of 
castration-resistant prostate cancer. 

To identify this key factor, the authors took 
samples of tumours and associated cells from 
castrated mice and from animals that under- 
went a mock operation, and searched for 
the genes that showed the greatest increase 
in expression in the samples from castrated 
mice. Their results included the gene encod- 
ing IL-23 (an immune signalling protein called 
a cytokine) and a gene that encodes a subunit 
of the receptor to which IL-23 binds. Analysis 
of prostate-cancer specimens from the clinic 
confirmed the importance of IL-23, and there 
were more IL-23-expressing MDSCs in cas- 
tration-resistant prostate-cancer specimens 
than in specimens from tumours that were 
not castration resistant. 

Calcinotto and colleagues propose that 
signalling mediated by MDSC-secreted IL-23 
and by the IL-23 receptor on prostate-cancer 
cells promotes the development of castration- 
resistant prostate cancer. Using pharmacological 
or genetic approaches to block IL-23-mediated 
signalling in mice, they obtained evidence 
that such treatment delays the development of 
castration-resistant prostate cancer. 

The authors carried out studies to determine 
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the signalling pathway downstream of IL-23 
that mediates the return of tumour growth, 
and focused on two IL-23-regulated proteins 
(STAT3 and RORy) that are part of a pathway 
that boosts androgen-receptor signalling”. 
Their results are consistent with a model 
in which IL-23-mediated activation of the 
STAT3-RORy pathway leads to an increase 
in expression of the androgen receptor and an 
increase in expression of genes whose tran- 
scription depends on that receptor (Fig. 1). 
Strikingly, the authors demonstrated that if 
mice that had developed castration-resistant 
prostate cancer were given an antibody that 
blocks IL-23 and an androgen-receptor 
inhibitor called enzalutamide, this reversed 
castration resistance and caused the animals’ 
tumours to shrink. 

Calcinotto and colleagues’ work has 
important clinical implications and advances 
our understanding of the biological processes 
that underlie castration resistance. Antibodies 
that block IL-23 are approved for clinical use 
to treat autoimmune conditions, which clears 
the way for them to be tested as a possible treat- 
ment for castration-resistant prostate cancer. 
The findings also raise the question of whether 
immune cells might contribute to the progres- 
sion of other sorts of cancer in which growth is 
driven by hormone-receptor signalling. Some 
controversy currently exists about whether 
MDSCs are a cell population that is distinct 
from normal neutrophils and monocytes, 
given that MDSCs are highly similar to those 
cells, yet are considered functionally different. 
There is not yet a clear consensus about how 
to identify MDSCs on the basis of the expres- 
sion of cell-surface markers’. This issue could 
affect future studies of these cells. 

As with any disease mechanism studied 
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castration-resistant prostate cancer. Calcinotto and colleagues found that a 
type of immune cell called a myeloid-derived suppressor cell (MDSC) can 
cause this treatment failure. Ifan MDSC cell secretes a protein called IL-23, 
this might bind to the IL-23 receptor (IL-23R) on tumour cells. This binding 
triggers a pathway in the tumour cell mediated by the proteins RORy and 
STATS (the latter is phosphorylated; P is a phosphate group), which can drive 
AR expression. This increase in AR expression helps to drive the androgen- 
dependent gene expression that boosts prostate-cancer growth. 


mainly in animal models, the prevalence of 
possible MDSC-associated castration resist- 
ance in humans remains to be determined. It 
might be a minor mechanism in most patients, 
a major mechanism in a minority of patients or 
somewhere in between. The scale of the effect 
of MDSCs, and the ability to select the specific 
people likely to respond to treatment targeting 
castration-resistant tumours, will prob- 
ably be crucial in determining whether such 
therapy against prostate cancer is successfully 
implemented in the clinic. = 
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CLIMATE SCIENCE 


Atlantic slowdown 
boosts surface warming 


The circulation system of the North Atlantic Ocean has weakened and is 
predicted to weaken further in the coming decades. An analysis suggests that this 
decline could lead to accelerated global surface warming. SEE LETTER P.387 


GERARD D. MCCARTHY & PETER W. THORNE 


lobal surface temperatures rose steadily 
(Gio 1975 to 1998, but this growth 

then slowed somewhat for about 
15 years — an event that gained popular atten- 
tion’ as a ‘hiatus’. Since then, we have experi- 
enced the four warmest years on record, which 
has served to dampen popular interest in the 
event. However, because climate change is a 
complex response to slowly varying external 
drivers, it is important to fully understand past 
climate behaviour and the underlying causes. 
On page 387, Chen and Tung’ report that the 
system of ocean currents known as the Atlantic 
Meridional Overturning Circulation (AMOC) 
can explain changes in rates of 
global surface warming. Rather 
than the conventional picture of a 
vigorous AMOC associated with 
elevated surface temperatures in 
the Atlantic Ocean, the authors 
emphasize the role of the AMOC 
in taking heat from the surface and 
storing it in the deep ocean. 

The connection between the 
AMOC and variations in the heat 
content of the subpolar North 
Atlantic Ocean has long been 
acknowledged. The AMOC trans- 
ports heat northwards to the sub- 
polar North Atlantic and to the 
Greenland, Iceland and Norwegian 
Seas. There, through a range of 
processes, deep water is formed 
that moves as a southward cold 
flow. This conveyor belt of north- 
ward-flowing, warm, shallow water 
and southward-flowing, cold, deep 
water defines the AMOC. 

Relative to latitudinal averages, 
surface temperatures could be 5°C 
cooler in the subpolar North Atlan- 
tic Ocean and up to 10°C cooler in 
the Norwegian Sea if the AMOC 
were absent’. Consequently, a 
strong AMOC is typically associ- 
ated with warming in the North- 
ern Hemisphere. This association 
is consistent with evidence from 
palaeoclimatology that suggests 
that, during the most recent ice age, 
warmer periods coincided with a 
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radiation 


vigorous AMOC and colder periods coincided 
with a weak AMOC*. 

Chen and Tung’s study emphasizes a differ- 
ent role for the AMOC in the modern climate. 
Atmospheric concentrations of greenhouse 
gases are currently being increased at a rate 
that is unprecedented in millennia and most 
likely millions of years. Asa result, the role that 
climate mechanisms might have had in the 
past might not be a good guide to their current 
or future role. The authors contend that half of 
the heat arising from ever-increasing green- 
house-gas concentrations is stored in the deep 
waters of the North Atlantic when the AMOC 
is increasing, thereby reducing overall global 
surface warming (Fig. 1). 


Atmosphere 


Trapped ; 
radiation | 


North 


Figure 1 | The role of Atlantic circulation in the modern 
climate. Increasing atmospheric concentrations of greenhouse gases 

mean that more incoming solar radiation is trapped in the atmosphere, a 
consequence of which is surface warming. Chen and Tung’ report that the 
ocean circulation system known as the Atlantic Meridional Overturning 
Circulation (AMOC) can offset this warming. The AMOC transports heat 
northwards (red arrows) in the North Atlantic Ocean; the light- and dark- 
orange colours indicate regions that are 5°C and 10°C warmer, respectively, 
than latitudinal averages. The authors emphasize the role of the AMOC in 
transporting heat from the surface to the deep ocean (black arrows); the 
pink and blue colours indicate regions where heat increases and decreases, 
respectively, when the AMOC is increasing (based on Fig. 2a of ref. 2). 
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Atlantic 7 


The authors show that a cycle of increasing 
and then decreasing AMOC from the 1940s 
to the mid-1970s coincided with a period of 
global-warming slowdown; a quiescent period 
of weak AMOC from the mid-1970s to the late 
1990s coincided with rapid global warming; 
and an increase in AMOC strength from the 
late 1990s to 2005 and a decrease thereafter 
coincided with the ‘hiatus’ in global warming 
(see Fig. 3 of the paper’). 

When the causes of the ‘hiatus’ were first 
being investigated, the Atlantic was not an 
obvious place to look. The focus was on the 
Pacific Ocean because the tropical Pacific was 
one of the only places where surface tempera- 
tures did not rise during this period’. Under- 
standing of the event developed as several 
factors were taken into account, including the 
effect of changes in ocean heat content across 
multiple ocean basins®. Chen and Tung now 
bring focus to the North Atlantic. Their work 
suggests that the warm surface temperatures 
there were indicative of an increasing AMOC 
and that the associated increase in ocean heat 
uptake played a key part in the ‘hiatus. 

One of the main caveats of Chen and Tung’s 
study is that, by necessity, the authors used 
proxies for AMOC strength because no direct 
observations of sufficient length exist. There 
are only four observatories that measure the 
AMOC across the full width of the 
Atlantic: SAMBA at 34.5°S, RAPID 
at 26°N, NOAC at 47°N and 
OSNAP between 53°N and 60°N. 
The longest-running, RAPID, 
was deployed in only 2004. These 
observatories need to be main- 
tained for many decades if we are 
to fully understand the role of the 
AMOC in our changing climate. 

There is much to be done to 
determine how the AMOC affects 
surface temperature in different 
regions and on different time- 
scales. For instance, Chen and 
Tung highlight the potential role of 
the Southern Ocean in heat uptake 
in the period since 2005. Such a 
feature could be part of a see-saw 
pattern of alternating heat uptake 
by the North Atlantic and Southern 
Ocean. 

There is also a distinct dif- 
ference between the effects of 
decadal AMOC variability and of 
an AMOC collapse on global tem- 
peratures. Although the prospect 
of the AMOC passing a tipping 
point and collapsing is considered 
unlikely, it is not impossible, and 
an event this dramatic could lead 
to global surface cooling’. The 
threshold between a weak AMOC 
that reduces ocean heat uptake, 
allowing global surface tempera- 
tures to rise unabated, and a very 
weak or collapsed AMOC that 


causes cooling in the North Atlantic and global 
surface warming to slow or stop will be a key 
point of debate. 

The AMOC is deemed “very likely” to 
weaken in the coming decades’. Indeed, the 
Atlantic has seen muted rises in surface tem- 
perature relative to the global ocean over the 
past few decades. This relative lack of warming 
has been interpreted as a fingerprint of AMOC 
decline, potentially linked to anthropogenic 
climate change*®. Whether the AMOC obser- 
vatories will document the predicted decline 
remains to be seen, but they have already 
observed that the AMOC is in a weakened 
state’. Chen and Tung predict that such a weak 
AMOC will result in a period of rapid global 
surface warming that could last for more than 
two decades. = 
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Speciation far from the 
madding crowd 


New species of marine fishes are found to emerge at a faster rate in high-latitude 
oceans, which have lower densities of species, than in the species-rich tropics. 
Are the tropics too crowded for new species to take hold? SEE LETTER P.392 


ARNE 0. MOOERS & DAN A. GREENBERG 


he tropics are, like many cities, hot, busy 

and crowded. It was previously thought’ 

that these conditions in the tropics gen- 
erate a hotbed for the formation of new species 
(speciation). Species diversity is remarkably 
high in the tropics and declines toward the 
poles. However, newly developed tools to meas- 
ure speciation rates, coupled with ever-growing 
global data sets, have enabled the surprising 
finding that terrestrial speciation rates for the 
past few million years are similar across differ- 
ent latitudes” or increase outside the tropics’. 
On page 392, Rabosky et al.‘ document a spe- 
ciation rate for marine fishes at high latitudes 
that is twice the speciation rate in tropical seas. 
This high speciation rate in cold, species-poor 
waters poses an interesting conundrum for evo- 
lutionary biologists and ecologists. 

There are two potential drivers of high 
speciation rates in the tropics. First, the ele- 
vated temperatures in the region both speed up 
metabolism, increasing the number of muta- 
tions, and decrease generation times. This is a 
potentially powerful combination, producing 
more of the variation necessary for evolution 
and the possibility of faster evolution. A second 
possible driver is ecological opportunity. The 
energy-rich tropics offer abundant resources 
that can support many different niches. And the 


tropics are so rich in species that the interactions 
of members of a single species with its competi- 
tors, predators and parasites might differ from 
place to place, leading to different adaptations 
and eventual divergence into new niches’. 
Although this narrative makes for a compelling 
theory, Rabosky and colleagues’ discovery sug- 
gests a different story, at least for marine fishes. 

The authors gathered genetic data for 
11,638 species of marine and freshwater fish, 
along with information on inferred evolution- 
ary relationships based on taxonomic group- 
ings for 19,888 additional fish species for 
which genetic data were not available. Using 
these data, and information from 139 dated 
fossil fishes, the authors generated a large set 
of plausible phylogenetic trees detailing the 
evolutionary relationships between all living 
marine fishes, and, crucially, estimates of when 
different lineages diverged from one another. 
These dated trees enable speciation rates to 
be inferred on the basis of the branching pat- 
terns of the tree. Species connected by short 
branches, and with many close relatives, have 
high speciation rates, whereas species that are 
separated by long branches and that have few 
close relatives have low speciation rates. 

Most taxonomic groups are made up of 
lineages with both low and high speciation 
rates. The marine fishes in the authors’ large 
phylogenetic trees were no exception, with 
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50 Years Ago 


More than 7,000 people died 

in traffic accidents last year in 
Britain and nearly 94,000 were 
seriously injured ... The causes 

and possible preventative measures 
were the subject of a recent 
conference ... A police traffic 
superintendent is reported to have 
said that the defect which causes 
nearly every accident lies in “the nut 
holding the wheel” One contributor 
pointed out that road engineers 
should not assume that drivers are 
omniscient. “If asked to make more 
than one decision at a time, they will 
fail; if faced with a situation which 
can be misinterpreted, someone will 
eventually find the wrong meaning”. 
The effect of human fallibility is easily 
apparent in the statistics ... during 
the last three months of 1967 ... after 
the breathalyser test came into 
operation ... driver and passenger 
casualties fell by 19 per cent, motor 
cyclist casualties by 16 per cent and 
pedal cyclists by 14 per cent. 

From Nature 20 July 1968 


100 Years Ago 


The utility of forests to a nation 

is one of the economic factors 

to its well-being which have 

been brought to an unforeseen 
prominence during the world-war: 
and perhaps to no other European 
nation has this ... development 
proved so startling, because so 
totally unsuspected, as to ourselves. 
Our woods were not grown from 
the commercial aspect — sport, 
amenity, and shelter to crops and 
stock were their main raison détre. 
We did not consider it necessary to 
grow woods for purely commercial 
reasons — that is, for the sake of the 
timber and pit wood and paper pulp, 
etc ... We obtained our requirements 
in these commodities by importing 
them from abroad, and relied on the 
Navy being able to safeguard these 
imports. We have now discovered 
our mistake and are paying for it. 
From Nature 18 July 1918 
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Figure 1 | Remarkable fish diversification in remarkably cold waters. Rabosky et al.’ analysed global patterns of where new species of marine fishes arise. 
The authors find that marine fishes that live near Earth’s poles, such as Antarctic icefishes (for example, the black-fin icefish, Chaenocephalus aceratus, shown 
here), give rise to new species much faster than do fishes in warmer and more species-rich tropical seas. 


speciation rates varying by more than 50-fold 
between lineages. The authors combined these 
values with global maps of where these species 
live, revealing a clear geographical structure to 
the speciation rates. Because small biases inte- 
grated over large amounts of data could pro- 
duce misleading inferences in these sorts of 
studies®, the authors considered several ways 
to estimate speciation rates and to map species’ 
location. But whether the authors considered 
the patterns looking species by species, place 
by place or ecoregion by ecoregion, there was 
always a pattern of the average speciation rate 
increasing from the tropics towards the poles. 

Rabosky and colleagues consider only a few 
potential mechanisms that might affect the rate 
at which marine fishes produce new species at 
high latitudes. At higher latitudes, marine fishes 
tend to have longer generation times and slower 
metabolisms’ than have fishes in the tropics, 
suggesting that such extended generation times 
and lower mutational input do not limit spe- 
ciation rate in these cold-adapted lineages. The 
authors also tested and discounted the interest- 
ing possibility that high-latitude species are the 
descendants of tropical lineages that exhibited 
adaptations for cold-water living and also hap- 
pened to have high speciation rates. This nega- 
tive result suggests that a high-latitude marine 
environment, rather than the species that colo- 
nizes it, drives high speciation rates. 

The correlation of high speciation rates 
with low diversity is consistent with the idea 
that there are unfilled ecological opportunities 
near the poles. However, ecological opportu- 
nity is something that is inferred rather than 
witnessed’. The shape of a phylogenetic tree 
can indicate slowing speciation as species 
numbers rise — a pattern that is consistent 
with diminishing ecological opportunity’. If 
closely related species in such clades occupy 
different niches, as might be the case for the 
high-latitude Antarctic icefishes* (Fig. 1), this 
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would be consistent with ecological opportu- 
nity having a key role in driving their diversifi- 
cation. Such analyses are needed to determine 
whether high-latitude groups have reached the 
ecological limits of their ecosystems or whether 
high-latitude fish diversity might be expected 
to continue increasing. 

Continued diversification at higher latitudes 
might seem reasonable, given that Earth’s 
cooling over the past 30 million years or so” 
has given rise to the present, relatively young 
temperate and polar realms. However, the 
rate of speciation at high latitudes reported by 
Rabosky (roughly 0.2 new species per species 
per million years) is high. If this rate had been 
sustained over the whole of the past 30 mil- 
lion years, high latitudes would have tropical 
levels of species diversity by now. Given that 
the accumulation of diversity depends on both 
speciation and extinction rates, one explana- 
tion that reconciles a high speciation rate 
and low current diversity is if both speciation 
and extinction are elevated outside the trop- 
ics’. This could result in a pattern in which 
an increase in the number of species is limited 
by high extinction rates, and poleward realms 
would have few, but relatively young, species. 
Measuring extinction rates is almost as diffi- 
cult as trying to assess ecological opportunity, 
but new approaches that combine information 
on extinct species represented in the fossil 
record with information from their living rela- 
tives’ might offer a way to investigate whether 
extinction rates are greater at higher latitudes. 

The view of high-latitude oceans as ‘sleepy’ 
backwaters remote from the exciting evolu- 
tionary bustle of the tropics will need to change 
if it turns out that both speciation and extinc- 
tion of marine fishes occur at a faster pace 
beyond the tropics. Such a pattern would imply 
that living cheek by jowl, or rather gill by jaw, 
in the tropics is a condition that is more con- 
straining than productive, such that the real 
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biodiversity action is taking place where there 
is less, rather than more, biodiversity. Far from 
the madding crowd, as it were. = 
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The News & Views article ‘Intestinal- 

niche conundrum solved’ (Nature 558, 
380-381; 2018) indicated that two papers 
(M. Shoshkes-Carmel et al. Nature 557, 
242-246; 2018, and B. Degirmenci et al. 
Nature 558, 449-253; 2018) solved an 
outstanding debate — the identify of a 
stromal-cell population that sends Wnt 
signals to intestinal stem cells. However, 

a paper published earlier this year 

(G. Greicius et al. Proc. Nat! Acad. Sci. USA 
115, E3173-E3181; 2018) also identified a 
stromal-cell source for Wnt signals. 


BRITISH ANTARCTIC SURVEY/SPL 


ARTICLE 


https://doi.org/10.1038/s41586-018-0298-5 


Electron ptychography of 2D materials to 
deep sub-angstr6m resolution 


YiJiang’®, Zhen Chen*®, Yimo Han?, Pratiti Deb!?, Hui Gao**, Saien Xie?*, Prafull Purohit!, Mark W. Tate!, Jiwoong Park, 
Sol M. Gruner), Veit Elser! & David A. Muller?>* 


Aberration-corrected optics have made electron microscopy at atomic resolution a widespread and often essential tool 
for characterizing nanoscale structures. Image resolution has traditionally been improved by increasing the numerical 
aperture of the lens (a) and the beam energy, with the state-of-the-art at 300 kiloelectronvolts just entering the deep 
sub-angstrém (that is, less than 0.5 angstrém) regime. Two-dimensional (2D) materials are imaged at lower beam 
energies to avoid displacement damage from large momenta transfers, limiting spatial resolution to about 1 angstrém. 
Here, by combining an electron microscope pixel-array detector with the dynamic range necessary to record the complete 
distribution of transmitted electrons and full-field ptychography to recover phase information from the full phase space, 
we increase the spatial resolution well beyond the traditional numerical-aperture-limited resolution. At a beam energy 
of 80 kiloelectronvolts, our ptychographic reconstruction improves the image contrast of single-atom defects in MoS» 
substantially, reaching an information limit close to 5a, which corresponds to an Abbe diffraction-limited resolution of 
0.39 angstrém, at the electron dose and imaging conditions for which conventional imaging methods reach only 0.98 angstrém. 


The ability to image individual atoms is essential for characterizing atomic resolution is high-angle annular dark-field (ADF) imaging, 
structure and defects in 2D materials’. In scanning transmission elec- which records electrons scattered through large angles to form an 
tron microscopy (STEM), the most common technique for achieving incoherent image. The maximum spatial information contained in 


40 mrad 
Fig. 1 | STEM imaging using the EMPAD. a, At each scan position, the b, c, Averaged diffraction pattern intensity (on a logarithmic scale) from 
incident probe (w0(k)) is focused on the sample and the entire diffraction the electron beam at the marked scan positions near a molybdenum 
pattern of the exit wave (|q-(k)|*) is recorded by the EMPAD. The blue and _ column. Insets show the intensity (on a linear scale) of the bright-field 
yellow atoms represent molybdenum and sulfur atoms in the object plane. disks. The substantial intensity differences at large scattering angles 
yo and w refer to the incident and exit wavefunctions respectively; r is provide contrast information for ADF imaging and are essential for 


the (x, y) positional coordinate in the real-space plane; and k is the (k,, ky) resolution enhancement in ptychography. 
wavenumber coordinate in the conjugate momentum-space plane. 
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Fig. 2 | Comparison of different imaging techniques using 4D EMPAD 
dataset measured from monolayer MoS). a, Coherent bright-field image. 
b, Incoherent ADF image. c, iCoM image. d, Phase of the transmission 
function reconstructed using full-field ptychography. The red arrows 
indicate a sulfur monovacancy that is detectable in ptychography. 

e-h, False-colour diffractogram intensities (on a logarithmic scale) of 
the bright-field (e), ADF (f), iCoM (g) and full-field ptychography (h) 
images. The information limit (white circle) of ptychography is close to 
5a (107 mrad); the information limits of the other imaging methods are 
also shown. i, Close-up of h. j, Line profile along the dotted horizontal 
white line in i (linear scale) across two diffraction spots. The peak at 5a 
corresponds to an Abbe resolution of 0.39 A. 


an ADF image (or other incoherent imaging modes) is determined 
by the momentum transfer across the diameter of the probe-forming 
aperture—that is, twice the semi-convergence angle ( a)*°. Therefore, 
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Fig. 3 | Real-space resolution test of full-field ptychography using 
twisted bilayer MoS. The two sheets are rotated by 6.8° with respect to 
each other, and the misregistration of the molybdenum atoms provides 

a range of projected distances that vary from a full bond length down 

to complete overlap. Atoms are still cleanly resolved at a separation of 
0.85 + 0.02 A, with a small dip still present between atoms separated by 
about 0.61 + 0.02 A, similar to the contrast expected for the Rayleigh 
criterion for conventional imaging. Atom-pair peaks at 0.42 + 0.02 A show 
a 6% dip at the midpoint, suggesting that the Sparrow limit lies just below 
0.4 A. The Raleigh resolution for ADF STEM is 1.2 A for these imaging 
conditions (Extended Data Fig. 3a). 


obtaining high-resolution images generally requires small wavelengths 
and large apertures, and the latter in turn introduces phase-distorting 
artefacts from geometrical and chromatic aberrations. The demon- 
stration of practical aberration correctors®’ has ameliorated these 
phase errors substantially, and for the past decade the state-of-the-art 
for ADF images has reached the deep sub-angstrém regime of about 
0.5-A resolution at 300 keV*’, which is sufficient for imaging most 
bulk materials. On the other hand, the characterization of 2D mate- 
rials, such as single-defect detection and imaging of interface or edge 
structures, always requires lower beam energies (roughly 20-80 keV) 
to minimize knock-on damage”''!. Because lower energies imply 
longer electron wavelengths, the resolution of ADF imaging is reduced 
substantially and reaching sub-angstrém resolution is possible only 
with specialized correctors that correct both geometric and chromatic 
aberrations or with monochromatic electron beams!*!3. Moreover, 
ionization damage, which cannot be avoided by lowering the beam 
voltage, also restricts the electron dose applied to the sample, limiting 
the ultimately achievable signal-to-noise ratio”, further reducing 
image resolution and contrast. 

However, it has long been recognized that the information limit set 
by diffractive optics is not an ultimate limit’®. There is phase infor- 
mation encoded throughout a diffraction pattern formed from a 
localized electron beam, in the form of interference patterns between 
overlapping scattered beams (Fig. 1a). As the incident localized beam 
is scanned, this phase information and hence the interference patterns 
change in a predictable manner that can be used to retrieve the phase 
differences—an approach known as ptychography'*'8. Although orig- 
inally conceived to solve the phase problem in crystallography, modern 
ptychography is equally applicable to non-crystalline structures !?-” 
and has received renewed attention as a dose-efficient technique” for 
recovering the projected potential of thin materials, with modifications 
for measuring finite-thickness and three-dimensional samples”>”*. In 
principle, the resolution is limited by the largest scattering angle at 
which meaningful information can still be recorded; however, because 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


. = sf 
se eof ** 
* & 


Fig. 4 | Ptychographic reconstructions using data with different cutoff 
angles. a-d, Ptychographic reconstructions using electrons collected 
using cutoffs of 1-4 times the aperture size (a). The averaged diffraction 
patterns are shown in the lower-left corner of each image. e-h, False- 
colour diffractograms (on a logarithmic scale) of the reconstructions 

in a-d. The white circles indicate the information limit. i, Line profiles 


electron-scattering form factors have a very strong angular dependence, 
the signal falls rapidly with scattering angle, so a detector with high 
dynamic range and sensitivity is required to exploit this information. 

Ptychography has been widely adopted for light?” and X-ray!*.8 
applications, but the technique is still underexplored in transmission 
electron microscopy, in large part because of the detector challenges. 
Traditional electron cameras such as charge-coupled devices (CCDs) 
and pixelated detectors have been hampered by slow readout speed 
or poor dynamic range. Previous work**?*** has mainly made use 
of electrons only within the bright-field disk; therefore, the image 
resolution did not overcome the 2a limit imposed by the physical 
aperture. The first attempt!” at demonstrating super-resolution 
ptychography involved phasing the Fourier coefficients of silicon out 
to the (400) reflection to reconstruct the unit cell with a resolution 
of 1.36 A. However, this result determined only structure factors, 
limiting its application to periodic crystalline structures. A more recent 
demonstration” for a lower-resolution scanning electron microscope 
equipped with a CCD camera showed that the resolution of iterative 
ptychographic reconstructions can be improved when using informa- 
tion at higher scattering angles. 

There are three challenges to improving resolution and dose 
efficiency to the point needed to advance beyond the current 
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across three sulfur columns (indicated by the vertical red dashed line in 
d). The columns containing two sulfur atoms are labelled 2S; the column 
with one sulfur atom is labelled S. j, Line profiles across the reconstructed 
probe function (dashed line in k) at different cutoffs. k, Probe profile 
reconstructed by ptychography using the dataset with a 4a cutoff. 


state-of-the-art diffractive imaging. First, a detector must be able to 
record the full range of scattered intensities without introducing non- 
linear distortions or saturating the central beam. Second, the detector 
must not only possess single-electron sensitivity, but also retain a high 
detective quantum efficiency when summing over the large ranges of 
empty pixels at high scattering angles. Third, each diffraction pattern 
must be recorded rapidly enough that the full image is not sensitive to 
drift and instabilities in the microscope, which usually leaves only a few 
minutes to record a full four-dimensional (4D) dataset. The combina- 
tion of the first and third conditions poses an additional constraint that 
the detector must also have a high dynamic current range. Therefore, 
it is not sufficient to count single electrons for a long time at a low 
beam current; instead, large currents per pixel need to be recorded in 
very short times. Most pulse-counting methods are limited to about 
2-10 MHz by the transit time of the electron cloud through the silicon 
detector. This translates to 0.3-1.6 pA per pixel, although few systems 
reach this limit. Instead, to keep nonlinearities below 10%, a limit of 
0.03 pA per pixel is more typical*”. Direct charge integration ina CCD 
geometry is even more limited by the well depth, to about 20 electrons 
per pixel per frame. At a 1-kHz frame, this corresponds to 0.003 pA per 
pixel—a limit at which single-frame Poisson statistics would then be 
below the Rose criterion for contrast detectability*®. 
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Fig. 5 | Simulation study of full-field ptychography as a function of 
cutoff angle and beam current. a, b, Reconstruction resolution evaluated 
by the maximum range of the reconstructed phase (a) and the root-mean- 
square (r.m.s.) width of the molybdenum atom defined by the standard 
deviation of a fitted Gaussian (b). At large electron doses, the resolution 

is determined by the maximum detector angle. As current decreases, the 
resolution is instead limited by the Poisson noise. ce, Reconstructed 
phase maps using diffraction patterns with 4a cutoff at beam currents 

of 0.01 pA (c), 0.1 pA (d) and 10 pA (e), as indicated by the arrows. The 
increase in the phase range and the decrease of the r.m.s. atom width at 
large dose (more than 1 pA) is a measure of the resolution improvement, 
but the increase in the phase range at low dose (less than 1 pA) reflects 
increasingly large noise fluctuations in the reconstructions, which are also 
evident in the increased variations of the r.m.s. atom width. 


To overcome these challenges, we developed an electron micro- 
scope pixel-array detector (EMPAD)*® that is capable of recording 
all the transmitted electrons with sufficient sensitivity and speed 
to provide a complete ptychographic reconstruction. Our EMPAD 
design has a high dynamic range of 1,000,000:1 while preserving 
single-electron sensitivity with a signal-to-noise ratio of 140 for a 
single electron at 200 keV*’. The detector retains a good performance 
from 20 keV to 300 keV. Here we operate at 80 keV, at which the 
signal-to-noise ratio per pixel is 50 for a single electron, the detective 
quantum efficiency is 0.96 and the maximum beam current per pixel 
is 5 pA. By using essentially all of the electrons collected (99.95% of 
the transmitted beam, as determined using multi-slice simulations), 
with a full 4D dataset acquired in typically a minute, our full-field 
ptychographic reconstructions roughly double the image resolution 
compared to the highest-resolution conventional single-channel 
imaging modes, such integrated centre-of-mass (iCoM)*°*! and 
ADF STEM. 
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Data acquisition and reconstruction 

In Fig. 1a we show a schematic of the experimental configuration with 
the EMPAD. To minimize radiation damage, a monolayer of MoS, 
is imaged at a primary beam energy of 80 keV. At each (x, y) scan 
position, the EMPAD records a diffraction pattern (k,, ky) from the 
convergent probe, thus forming a 4D dataset (x, y, k, ky). In Fig. 1b, c 
we show averaged diffraction patterns corresponding to two positions 
near a single molybdenum column. Supplementary Videos 1 and 2 
show continuous evolutions of averaged and raw diffraction patterns, 
respectively, at various scan positions, where, compared to Fig. 1b, c 
it is easier to observe the intensity variations in the overlaps between 
the higher-order diffraction disks that occur as the relative phase of 
the interfering beams changes with position. The considerable changes 
in the distribution outside the central disk provides essential contrast 
information in ADF images; the resolution improvement in full-field 
ptychography over previous bright-field ptychographic methods stems 
from exploiting the phase information encoded in the contrast between 
overlapping higher-order disks. The angle-averaged radial distribu- 
tion function of the position-averaged diffraction pattern (Extended 
Data Fig. 1) shows a four-orders-of-magnitude intensity range for 
our dataset. The reconstruction algorithm is implemented using the 
extended ptychographic iterative engine (ePIE) algorithm**”’, which 
reconstructs the transmission function iteratively and refines the probe 
function to accommodate aberrations and noise. We also compare 
our performance to the simpler Wigner-distribution deconvolution 
(WDD)*, which in its simplest form assumes a known probe function 
and uses the information within the central disk, and shows a similar 
performance to using ePIE on only the central disk (that is, with a 
cutoff «) or iCoM imaging. In principle, WDD could also utilize the 
dark-field signal and surpass the aperture-limited resolution if a proper 
de-noising strategy is applied**“*. 

The 4D EMPAD data can generate all elastic imaging modes for 
benchmarking from the same dataset, including coherent bright-field, 
iCoM and ADF modes. As shown in Fig. 2a, e, the coherent bright-field 
image has the poorest resolution (restricted to within a, as expected). 
The incoherent ADF image (Fig. 2b, f) doubles the information limit 
(from a to 2a), but is limited by a low signal-to-noise ratio and residual 
probe aberrations. Although the iCoM image is less noisy, its resolu- 
tion is still within 2a (Fig. 2g) because the structural information is 
influenced by the incident probe via convolution. By contrast, full-field 
ptychography recovers the phase of the transmission function directly 
(Fig. 2d) and achieves an information limit of 5a (Fig. 2h). Noise arte- 
facts are also reduced substantially and the light-atom sulfur mono- 
vacancy (indicated by red arrows) is resolved more clearly. In Fig. 2i, j 
we show an enlarged section of the Fourier intensity map from the 
ptychographic reconstruction and a line profile across a diffraction spot 
at the 5a limit, demonstrating an estimated Abbe resolution” of 0.39 A 
or better (there are higher-order spots of weaker intensity but they are 
not as uniform in all directions). For comparison, with our electron 
optical conditions, the expected Abbe resolution for conventional inco- 
herent imaging modes such as ADF STEM is 2a or 0.98 A. 

A second measure of spatial resolution is the minimum resolvable 
distance between two atoms. For 2D materials, this measure is com- 
plicated by the fact that it requires atoms to be spaced closer than the 
shortest known bond lengths. To accomplish this test, we use a twisted 
bilayer sample of two MoS, sheets rotated by 6.8° with respect to each 
other. This effectively creates an incommensurate atomic moiré pattern, 
which provides projected Mo-Mo atomic distances that vary from a 
full bond length to fully superimposed atoms, with many intermediate 
distances across the incommensurate moiré quasi-periodicity of 28 A 
(see, for example, Fig. 1c of ref. “°). In Fig. 3 we show the ptychographic 
reconstruction across a moiré supercell, in which atomic columns 
midway between the aligned regions are resolved as separate atoms at 
0.85 +£0.02 A. The dominant uncertainty here, and in our other distance 
measurements, is the systematic error from scan distortions rather than 
random errors from counting statistics. The dip between adjacent col- 
umns can still be seen at 0.60 + 0.02 A—close to the Rayleigh limit 
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Fig. 6 | Comparison between ptychographic techniques and low-angle 
ADF imaging at low electron doses. a, b, Ptychographic reconstructions 
of simulated data with an in-focused probe, using the WDD (a) and ePIE 
(b) methods. c, ePIE reconstructions of data with a large defocused probe. 
d, Low-angle ADF image (integrating from la to 4a) using the same 


for resolution. Atom-pair peaks measured at 0.42 +0.02 A show a 6% 
dip at the midpoint (line profiles through atoms pairs are shown in 
Extended Data Figs. 2, 3). From a rigid model structure of the rotated 
bilayer, assuming that no relaxation occurs (even though some probably 
does), the model separations for the atom pairs marked in Fig. 3 are 
predicted to be 0.87 A, 0.60 A and 0.36 A. Although not all atoms can 
be reconstructed because of scan noise, we have multiple moiré repeats 
to distinguish random from systematic errors. Ignoring source-size 
contributions, the expected Rayleigh limit for an incoherent imaging 
mode for this experimental condition is 1.2 A, and many atom pairs 
are completely unresolvable in the ADF image (Extended Data Fig. 3a). 
Our full-field ptychographic reconstruction demonstrates double the 
Rayleigh resolution compared to conventional 2a imaging methods. 
Moreover, some closely spaced atoms lose the central dip at just below 
0.40 A (the Sparrow’ criterion for resolution), close to the Abbe limit 
estimated from Fig. 2. 

To understand how dark-field electrons contribute to resolution 
improvement, we performed additional reconstructions using diffrac- 
tion patterns with outer cutoff angles varying from one to four times the 
aperture size (la—4a). As shown in Fig. 4a—h, when using only the cen- 
tral bright-field disk (1a), the reconstructed phase (Fig. 4a) has a rela- 
tively low resolution, similar to that of the ADF and iCoM images. As 
the cutoff increases, atoms become sharper and more clearly resolved 
(Fig. 4a-d). Higher-spatial-frequency information also appears in the 
diffractograms (Fig. 4e-h). Beyond 3a, where there are fewer scattered 
electrons, the improvements become less obvious and the reconstruc- 
tion is limited mainly by the electron dose. As discussed in more detail 
in the following section, increasing the collection angle beyond the 
point at which there is meaningful signal in the diffraction pattern does 
not introduce high-spatial-frequency artefacts. Instead, the reconstruc- 
tion retains its limiting form. We note that the reconstructed amplitude 


simulated dataset as for a and b. The defocused ePIE approach (c) shows 
a dose advantage over the other two ptychography approaches (a, b) by 
a factor of roughly two or better, and over low-angle ADF imaging 
(d)—currently the optimal single-channel imaging method for 2D 
materials—by a factor of roughly four. 


of the transmission function also shows the atomic structure of MoS. 
The amplitude modulations are weak, suggesting that the specimen is 
close to a pure phase object (Extended Data Fig. 4). 

As a test of linearity, Fig. 4i shows that the phase at the position of the 
sulfur monovacancy, where only one sulfur atom is present, is about 
half of the phase shift of the two-sulfur sites, validating the strong- 
phase approximation and ePIE reconstruction for these thin 2D mate- 
rials. That the reconstructed probes (Fig. 4j, k) have similar shapes at 
different cutoffs also indicates that it is the dark-field electrons that 
contribute to resolution improvement. The asymmetric probe shape 
is due to residual aberrations and agrees with measurements using the 
singular-value-decomposition approach”’. 


Influence of electron dose 

We explored the potential limits of full-field ptychography further 
using simulated datasets for a wide range of collection angles and 
beam currents, including cases where the cutoff is extended beyond 
most of the scattered electrons and where the dose is too small for a 
stable reconstruction to be achieved. We evaluate the image quality by 
both the range of the reconstructed phase and the root-mean-square 
width of the molybdenum atoms measured from the standard devia- 
tion of a Gaussian fit. These measures capture the trends in the height 
and width of the atom peak, respectively. At high dose, the ultimate 
information limit of the ptychographic reconstruction is expected to 
be twice the cutoff angle—that is, 8a for a 4a cutoff. In practice, as 
shown in Fig. 5, ptychographic reconstructions are influenced mainly 
by the electron dose, and these limits are not reached for the larger 
cutoffs. There is only a slight improvement between the 3a and 4a 
cutoff at a typical operating beam current (1-50 pA), in agreement 
with the experimental data in Fig. 4. If the beam current is too low 
(for example, 0.01 pA, which corresponds to a dose of 260 electrons 
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per A’), atoms become distorted with reduced image resolution, but 
some of the overall structure of MoS; is still recognizable (Fig. 5c). As 
the beam current increases, the influence of Poisson noise becomes 
less important because there are sufficient electrons scattered into high 
angles to provide interference between higher-order lattice planes. At 
higher doses, the resolution of the ptychographic reconstruction bene- 
fits more fully from the increased maximum collection angle. For large 
collection angles and high doses, a diminishing return eventually sets 
in, with the resolution scaling logarithmically with the dose, suggesting 
that in practice the resolution will ultimately be limited by dose rather 
than the finite size of the scattering potential (root-mean-square width 
of about 0.1 A) or thermal vibrations. Because the EMPAD has a high 
detective quantum efficiency, increasing the cutoff angle beyond where 
there is signal does not compromise resolution or introduce additional 
artefacts, as demonstrated by the fact that all curves in Fig. 5a collapse 
to the same trend as the beam current decreases. 

In Fig. 6 we compare the performance of ePIE and WDD ptycho- 
graphic reconstructions at low electron dose. Using the same datasets, 
simulated with a small in-focused probe, both methods yield simi- 
lar results and can achieve atomic resolution at around 500 electrons 
per A?. On the other hand, using a large defocused probe, the ePIE 
technique can improve reconstruction quality beyond that of WDD 
(Fig. 6c). Overall, ptychographic reconstructions are more dose- 
efficient than are low-angle ADF reconstructions (integrating from la 
upwards; Fig. 6d); low-angle ADF is currently the most dose-efficient 
single-channel STEM imaging mode for single-atom detection*”. For 
a more typical range of ADF angles, such as the experimental data of 
Fig. 2b, the effects are more pronounced (see, for example, Fig. 2d). 
As shown in Extended Data Fig. 5, the advantage of ptychography 
over ADF imaging becomes more noticeable for materials with lighter 
elements, such as graphene. 


Discussion 

In addition to the beam current and detector configuration required for 
high-resolution ptychographic reconstruction, other practical sources 
of errors such as sample contamination and scanning drift may cause 
distortions and reduce reconstruction quality. However, we have found 
that full-field ptychography outperforms all other techniques that we 
have tested under the same conditions (Extended Data Fig. 6). By incor- 
porating other physical constraints and prior knowledge, we envisage 
that more advanced reconstruction strategies, when applied to full- 
field electron ptychography data, could compensate for inaccurate scan 
positions or make allowances for thick specimens with strong dynamic 
scattering. 

In summary, we have demonstrated that with the entire distribution 
of scattered electrons collected by the EMPAD, full-field ptychography 
greatly enhances image resolution and contrast compared to tradi- 
tional electron-imaging techniques, even at low beam voltages. With 
our improved detector, atomic-scale ptychographic reconstructions are 
no longer restricted by the aperture size. Instead, image quality is deter- 
mined by the electron dose and collection angle. Our technique provides 
an efficient tool for unveiling sub-angstr6m features of 2D or dose- 
sensitive materials. Combined with the ultralow-voltage aberration- 
corrected microscopes that have recently been developed, it has the 
potential to tackle currently hard problems such as direct imaging 
of lattice displacements in twisted-layer structures or of structural 
distortions around single-atom dopants and vacancies, and even three- 
dimensional tomography. 
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Any Methods, including any statements of data availability and Nature Research 
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METHODS 


EMPAD data acquisition. The 4D dataset of monolayer MoS, was taken using an 
aberration-corrected FEI Titan with 8.2-pA beam current, 80-keV beam energy 
and 21.4-mrad aperture size, with the dose limited by the radiation resistance of 
the sample. The EMPAD has 128 x 128 pixels and a readout speed of 0.86 ms per 
frame. The exposure time was set to 1 ms per frame in all experiments. 51 x 87 
diffraction patterns with a scan step size of 0.21 A were used to generate bright- 
field, ADF and iCoM images in Fig. 2a—c. The ADF image was integrated from 
64.2 mrad (3a) to 84.6 mrad (4a). Higher angles did not add important contribu- 
tions to the signal. The dataset of twisted MoS, in Fig. 3 was taken with the same 
beam conditions except for a 10.1-pA beam current. 68 x 68 diffraction patterns 
with a scan step size of 0.59 A were used for ptychographic reconstruction. 
Ptychographic reconstructions. Before reconstruction, all diffraction patterns 
are padded with zeros to a total size of 256 x 256 and thus the pixel size in the 
reconstructed phase is 0.12 A per pixel. The ePIE method” is implemented with 
modifications to exclude bad pixels in the diffraction patterns. The algorithm uses a 
multiplicative approximation, which is a generalization of the strong-phase approx- 
imation, allowing for both a strong phase object and a variable amplitude term. 
It aims to minimize the Euclidian distance between reconstructed and measured 
diffraction patterns. In general, the convergence of reconstructions depends on 
the number of iterations and update parameters for the transmission function and 
the probe function. Because experimental data contain noise and other sources of 
errors, fast convergence may introduce noisy artefacts and reduce reconstruction 
quality**“’. To alleviate this problem, we used a small update parameter (0.1) for 
the transmission function and limited the reconstruction of the probe function 
to data taken in areas with minimal contamination. For our experimental con- 
ditions, with cutoff angle 6max= 80 mrad, the thickness limit dz for treating the 
sample as a projected potential and neglecting beam propagation is estimated to be 
dz= X/[2sin?(Omax/2)] & 1.3 nm°°, which is larger than the thicknesses of both the 
monolayer MoS, (3.1 A) and the bilayer MoS, (9.8 A). 

Fourier resolution estimate. All diffractograms (Fourier intensity) of bright- 
field, ADF, iCoM and ptychography reconstructions in Fig. 2 were calculated 
from images constructed from 128 x 128 diffraction patterns (see Extended Data 
Fig. 6). To visualize diffraction spots, a periodic and smooth decomposition*! was 
applied to images to reduce artefacts caused by edge discontinuities. Next, the real- 
space image was multiplied by a Gaussian function, making the diffraction spots 
slightly larger and thus more visible. The Fourier intensity was rescaled to enhance 
the intensity of higher-order spots for better visualization. 

4D data simulation. All datasets used for dose-cutoff simulations were generated 
by the p STEM software, which models the atomic potential using scattering 


factors given in ref. °°. For the in-focused probe, 21 x 24 diffraction patterns 
with a 0.45-A scan step size were simulated at 80-keV beam energy and with 
21.4-mrad aperture size. The thermal diffuse scattering effect was included with 
the frozen-phonon approximation. The diffraction patterns were further corrupted 
with Poisson noise determined by the simulated beam dose. Extended Data Fig. 7 
shows selected ePIE reconstructions at different beam currents (0.01-100 pA) and 
cutoff angles (la—4qa). Simulations (Fig. 6c) of the large probe profile used a 80-nm 
defocus and 10.04-A scan step size. 

Effect of chromatic aberrations. Ptychography is less sensitive to chromatic blur 
than is direct phase imaging™, but it is still sensitive, especially at low doses. 4D 
datasets including chromatic aberrations were simulated for a Gaussian energy 
spread with full-width at half-maximum of AE= 1.1 eV, a chromatic aberra- 
tion coefficient of C,= 1.72 mm and a beam energy of 80 keV. Two different 
probe-forming aperture sizes, 21.4 mrad and 35 mrad, were chosen to reflect 
conditions under which chromatic blur is moderate and strong, respectively. The 
dose dependence of ePIE reconstructions for datasets with and without chromatic 
aberration is shown in Extended Data Fig. 8. Under the experimental conditions, 
the influence of chromatic aberration is visually negligible. At lower doses, the 
larger chromatic blur for the larger aperture leads to a worse reconstruction. 
Data availability. All relevant data are available from the corresponding author 
on request. 

Code availability. Code developed at Cornell, including visualization software 
for 4D datasets, is available from the corresponding author on request. |,STEM 
was developed at the University of Melbourne and can be downloaded from 
http://tcmp.ph.unimelb.edu.au/mustem/muSTEM.html. 
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Extended Data Fig. 2 | Line profiles through atom pairs in the twisted 
bilayer MoSp. Line profiles are from atom pairs in Fig. 3, with the 
respective subregions shown on the left. a~c, The measured peak—peak 
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Extended Data Fig. 3 | ADF image and line profiles through atom pairs but cannot be resolved explicitly in our reconstruction. For a more detailed 


in the twisted bilayer MoS. a, ADF image synthesized from the 4D comparison, a red box is placed over corresponding regions in a and b. 
diffraction dataset. b, Phase of the transmission function reconstructed c, Enlarged image of the red boxed region in b, with the false colour scale 
by ptychography. The yellow marker indicates a pair of atoms that is of Fig. 3. d, Line profiles across the atom pairs labelled with dashed lines in c. 


predicted to have a separation of 0.2 A on the basis of the structural model, | The peak-peak separations are overlaid near the line profiles. 
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Extended Data Fig. 4 | Reconstructed amplitude and phase of improves as the cutoff angle increases. Amplitude modulations are 
monolayer MoS, at different cutoff angles. Both the amplitude (left relatively weak, deviating by only a few per cent from a pure phase object 
panels) and phase (right panels) of the reconstructed transmission (that is, an object function with unit amplitude). 

function show the atomic structure of monolayer MoS). Image resolution 
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Extended Data Fig. 5 | Comparison between ptychography techniques la to 4a) reconstruction using the same simulated datasets. Both 
and low-angle ADF imaging of graphene. a, b, Ptychographic ptychographic methods show similar reconstructions and are about 10 
reconstructions of simulated data with an in-focused probe, using the times more dose-efficient than the low-angle ADF technique. Beam 
WDD (a) and ePIE (b) methods. c, Low-angle ADF (integrating from energy, 80 keV; aperture size (a), 21.4 mrad. 
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Extended Data Fig. 6 | Influence of scanning drift and contamination. variations. In the ptychographic reconstruction, scanning drift distorts 
a-c, ADF image (a), iCoM image (b) and phase of transmission and blurs reconstructed atoms in the vicinity of the scan distortion, but the 
reconstructed by full-field ptychography (c) using 128 x 128 diffraction overall resolution away from the distortion remains higher than the other 
patterns, covering a field of view of 2.7 nm x 2.7 nm. The ADF and imaging modes. 


iCoM reconstructions both suffer from stripe artefacts and large contrast 
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beam current (pA) 


Extended Data Fig. 7 | Effect of dose and cutoff angles on ptychographic detector. As the beam current decreases, the resolution becomes dose- 
reconstructions of monolayer MoS, using simulated diffraction limited and noise artefacts start to appear in the ePIE reconstruction. 
patterns. At high beam current, the resolution of the ptychography Beam energy, 80 keV; aperture size (a), 21.4 mrad. 

reconstruction is fundamentally determined by the collection angle of the 
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Extended Data Fig. 8 | Effect of chromatic aberrations at different low dose of 10* electrons per A? (bottom row). In the presence of noise, 
electron doses for ptychographic reconstructions of monolayer MoS chromatic aberrations degrade the phase range of the reconstruction 
using simulated datasets at 80 keV. Two convergence semi-angles are compared with the achromatic data. The data for the larger convergence 
shown, 21.4 mrad (left two columns) and 35 mrad (right two columns), semi-angle are more strongly affected. At infinite and experimental doses, 
representing conditions under which chromatic aberrations have moderate _ ptychographic reconstructions with and without chromatic aberration 
and large effects on the incident probe shape, respectively (C, = 1.72 are visually similar for both convergence angles. At low dose and with 
mm, AE=1.1 eV). 21.4 mrad is also the experimental convergence chromatic aberration, the reconstructed atoms are broadened, and distinct 
angle. The incident electron dose levels are an infinite dose (top row), artefacts appear for a convergence angle of 35 mrad. 


the experimental dose of 1.16 x 10° electrons per A? (middle row) and a 
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Insights into clonal haematopoiesis from 
8,342 mosaic chromosomal alterations 


Po-Ru Loh! *-4*, Giulio Genovese?*+4*, Robert E. Handsaker?*, Hilary K. Finucane’, Yakir A. Reshef®, 


1,12 


Pier Francesco Palamara’, Brenda M. Birmann§, Michael E. Talkowski2*?"!°, Samuel F. Bakhoum!)2, 


Steven A. McCarroll?3:4:15* & Alkes L. Price?!3:15« 


The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 
mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK 
Biobank participants using phase- based computational techniques (estimated false discovery rate, 6-9°/). We found six 
loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At 
three such loci (MPL, TM2D3- TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance 
(5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be 
objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated 
with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide 


range of effects on human health. 


Clonal expansions of blood cells containing somatic mutations are 
often observed in individuals without cancer’~!°. Consistent with the 
idea that clonal mosaicism can be a precancerous state, detectable 
mosaicism confers a more than tenfold increased risk of future haema- 
tological malignancy’ and often involves pro-proliferative mutations. 
Several studies have suggested that inherited variation can influence 
the likelihood of clonal mosaicism'>!*!. 

The limiting factor in almost all studies of clonal mosaicism has 
been sample size, with earlier insights arising from analyses of up to 
around 1,000 mosaic events. Two key factors determine the number of 
detectable mosaic mutations: the number of individuals analysed, and 
the ability to detect clonal expansions present at low-to-modest cell 
fractions. Here we describe insights from an analysis of 8,342 mosaic 
chromosomal alterations (mCAs) which we identified in single nucle- 
otide polymorphism (SNP) array data from 151,202 UK Biobank 
participants” using a sensitive algorithm we developed to make use 
of long-range haplotype phase information (building on published 
work®). We also draw upon data on health outcomes during 4-9 years 
after DNA sampling. 

These data provide insights into clonal expansion, including mech- 
anisms by which inherited variants at several loci act in cis to generate 
or propel mosaicism. We also identify specific mCAs that associate 
strongly with future haematological malignancies. 


Mosaic chromosomal alterations in UK Biobank 

We analysed allele-specific SNP-array intensity data previously 
obtained by genotyping blood-derived DNA from 151,202 UK 
Biobank participants (40-70 years of age)*’; 607,525 genotyped 
variants remained after quality control (see Methods). We detected mCAs 
at cell fractions as low as 1% by using long-range phase information 


that is uniquely available in the UK Biobank”*™*. Intuitively, accurate 
phasing allows the detection of subtle imbalances in the abundances 
of two haplotypes by combining allele-specific information across a 
very large number of SNPs (Extended Data Fig. 1). To make maximal 
use of phase information, we developed a new statistical method for 
phase-based mCA detection (see Methods and Supplementary Note 1). 

We detected 8,342 mCAs (in 7,484 of the 151,202 individuals ana- 
lysed) at an estimated false discovery rate (FDR) of 6-9% (Fig. 1, 
Extended Data Fig. 2, Supplementary Table 1, and Supplementary 
Notes 2, 3; validation rates could differ from this FDR estimate). 
We confidently classified 71% of the detected mCAs as either loss, 
copy-number neutral loss of heterozygosity (CNN-LOH), or gain; 
for the other 29% of events, copy-number state could not be inferred 
definitively (Fig. 2a and Supplementary Note 1). Most detected mCAs 
(5,901 of 8,342) were present at inferred cell fractions below 5% 
(Supplementary Note 4) and would have been undetectable without 
long-range phasing (Supplementary Note 5). The genomic distribu- 
tion of detected mCAs was broadly consistent with those found in 
previous studies’, as was the observation that individuals acquire 
multiple mCAs much more frequently than expected by chance (Fig. 2b, 
Extended Data Fig. 3, Supplementary Tables 2, 3, and Supplementary 
Note 6); differences (for example, in relative rates of del(20q) calls”°) 
could be explained by differing methodological sensitivity or genotyping 
platforms (Supplementary Note 4). 

Commonly deleted regions (CDRs) below 1 Mb in length may 
indicate haploinsufficient tumour-suppressor genes for which loss of 
one copy promotes cell proliferation”. Focal deletions most frequently 
targeted 13q14, DNMT3A, and TET2, as previously observed”*®; we 
further observed that most CNN-LOH events on 13q, 2p, and 4q 
spanned the same CDRs (Fig. 1 and Supplementary Note 7). We 
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Fig. 1 | Mosaic chromosomal alterations detected in 151,202 UK 
Biobank participants. Each horizontal line corresponds to an mCA; a 
total of 5,562 autosomal events in 4,889 unique individuals are displayed. 
We detected an additional 2,780 chromosome X events in females (mostly 
whole-chromosome losses). Detected events are colour-coded by copy 
number. Focal deletions are labelled in red with names of putative target 
genes. Loci containing inherited variants influencing somatic events in cis 
are labelled in the colour of the mCA (red for del(10q)-associated FRA10B, 
green for CNN-LOH-associated loci). Enlarged per-chromosome plots are 
provided in Supplementary Note 2. 


detected new CDRs at ETV6, NF1, and CHEK2, which are commonly 
mutated in cancers, and at RPA2 and RYBP. We also detected a CDR 
at 16p11.2 overlapping a region whose deletion is a known risk factor 
for autism and other neuropsychiatric phenotypes, though we did not 
detect this mCA among 2,079 sequenced genomes from the Simons 
Simplex Collection (SSC)”*?” (Supplementary Note 8). Deletions 
tended to be concentrated on chromosomes that are seldom dupli- 
cated?8 (Fig. 2c and Supplementary Table 1), supporting the theory 
that cumulative haploinsufficiency and triplosensitivity shape clonal 
evolution”. 

We found several notable exceptions to a general pattern in 
which acquired mutations are most common in the elderly and in 
males'””8 (Fig. 2d and Supplementary Table 4). Loss of chromo- 
some X in females*° was by far the most common event we detected 
(Supplementary Table 1 and Supplementary Note 2), with frequency 
increasing markedly with advancing age (Fig. 2d and Supplementary 
Table 4). (We did not examine loss of chromosome Y, which has been 
studied elsewhere”’.) Stratification of autosomal mCAs by location and 
copy number revealed an unexpected relationship: although most gain 
events were (as expected) enriched in elderly individuals and in males, 
CNN-LOH events tended to affect both sexes equally (Fig. 2e and 
Supplementary Table 5). Three mCAs exhibited unusual age and sex 
distributions (FDR 0.05; binomial and z-tests): gains on chromosome 
15 were much more frequent in elderly males*!; and 16p11.2 deletions 
and 10q terminal deletions were much more frequent in females and 
exhibited frequency unrelated to age. Age-independent events could 
in principle occur early in development or take less time to reach high 
cell fractions; sex-specific effects (which we replicated in previous data 
sets!8; Supplementary Note 3) will require future work to explain. 

Some acquired mutations could in principle arise or be selected 
within specific haematopoietic lineages. We tested this hypothesis 
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by examining individuals in the top percentile for counts of lympho- 
cytes, basophils, monocytes, neutrophils, red blood cells, or platelets. 
We identified many mCAs that were significantly concentrated (FDR 
0.05; Fisher's exact test) in one or more of these subsets of the cohort 
(Fig. 2f and Supplementary Table 6). Consistent with the idea that these 
relationships might reflect clonal selection in specific blood cell types, 
mutations commonly observed in chronic lymphocytic leukaemia 
(CLL)*?73 were enriched among individuals with high lymphocyte 
counts, and JAK2-related 9p events (which are commonly observed 
in myeloproliferative neoplasms (MPNs)) were most common among 
individuals with high myeloid cell counts. While future work will be 
needed to replicate and further explore these findings, our results 
suggest that mCAs may produce blood-composition phenotypes in 
individuals with no known malignancy. 


Inherited variants affect acquisition of nearby mCAs 

To identify inherited influences on mCA formation or selection, we 
performed chromosome-wide scans for associations between mCAs 
and germline variants on the same chromosome (see Methods). This 
analysis revealed four loci at which inherited variation strongly associ- 
ated with the acquisition of genomically nearby autosomal mCAs, and 
two loci on chromosome X associated with X loss in females (Table 1, 
Figs. 3, 4). We also replicated an earlier association of the JAK2 46/1 
haplotype with 9p CNN-LOH!*-!8:?° (Extended Data Fig. 4). To 
identify mechanisms that might underlie these associations, we fine- 
mapped these loci using whole-genome sequence (WGS) data and 
studied the phase of risk alleles relative to associated chromosomal 
alterations in cis. 

Somatic terminal 10q deletions associated strongly (P=6.1 x 10~”; 
Fisher’s exact test) with the common SNP rs118137427 near FRA 10B, 
a known genomic fragile site**° at the estimated common breakpoint 
of the 10q deletions (Table 1 and Fig. 3a). All 60 individuals with these 
mosaic 10q deletions had inherited the rs118137427:G risk allele (the 
allele frequency is 5% in the population), which was always inher- 
ited on the same chromosome that subsequently acquired a terminal 
deletion (Table 1). 

To identify a causal variant potentially tagged by the rs118137427:G 
risk allele, we searched for acquired 10q deletions in WGS data from 
520 SSC families (see Methods). We identified two parent-child duos in 
which both parent and child had acquired the 10q terminal deletion (in 
mosaic form); all four individuals possessed expanded AT-rich repeats 
at FRA10B on the rs118137427:G haplotype background (P=0.01; 
Fig. 3c). Further evidence that the rs118137427:G risk allele tags an 
unstable version of the FRA10B locus* was provided by analysis of 
the variable number tandem repeat (VNTR) sequence at FRA1OB (in 
all 2,079 individuals). This analysis revealed a diversity of novel VNTR 
sequence motifs (12 distinct primary repeat units carried by 26 individ- 
uals from 14 families), all on the rs118137427:G haplotype background 
(Extended Data Fig. 5a, b and Supplementary Note 8). (The VNTR 
motifs did not associate with autism status in the SSC cohort.) The 
motifs had lengths of 38, 39, 42, and 43 bp and exhibited evidence of 
repeat expansion (probably more than 75 copies in the longest alleles**); 
by contrast, the hg19 reference sequence at FRA10B contains three cop- 
ies of a 40-bp repeat. Imputing the VNTRs into the UK Biobank showed 
that they explained 24 of 60 del(10q) cases, despite being present in only 
about 0.7% of the cohort (Supplementary Table 7). Notably, individuals 
with del(10q) were as young as other UK Biobank participants, and 
51 of 60 were female (binomial P= 1.8 x 107”) (Fig. 3b); these unu- 
sual patterns (which were shared with 16p11.2 deletions) will require 
further study (Supplementary Note 8). 

CNN-LOH events on chromosome (chr) 1p strongly associated 
(P=6.2 x 1071, lead SNP rs144279563) with three independent, rare 
risk haplotypes (allele frequencies = 0.01-0.05%) at the MPL proto- 
oncogene at 1p34.1; the three haplotypes increased risk for 1p CNN-LOH 
by factors of 53, 63, and 103 (95% confidence intervals (CIs): 28-99, 
29-139, and 35-300, respectively) (Table 1, Fig. 4a, and Supplementary 
Table 8). Other individuals with 1p CNN-LOH mosaicism also shared 
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Fig. 2 | Distributional properties of detected mCAs. a, log, R ratio 
(LRR), measuring total allelic intensity, scales roughly linearly with 
B-allele frequency (BAF) deviation, measuring relative allelic intensity, 
among events with each copy number’®. b, Most individuals with a 
detected autosomal mCA have only one event, although a larger number 
than expected (441 versus 100) have multiple events. Several pairs of mCA 
types co-occur much more frequently than expected by chance; edge 
weights in the co-occurrence graph scale with enrichment. c, Autosomes 
with more gain events tend to have fewer loss events (excluding deletions 
involving V(D)J recombination on chromosomes 14 and 22); Spearman’s 


long haplotypes containing MPL, suggesting the existence of additional 
very rare risk variants (Extended Data Fig. 5c). Notably, although gain- 
of-function mutations in MPL lead to myeloproliferative neoplasms*””*, 
the lead SNP on one haplotype, rs369156948, is a protein-truncating 
variant (PTV) in MPL with no association to haematological malig- 
nancies in the UK Biobank (0 cases among 36 carriers). 

We were able to identify a likely mechanism for selection of the 
CNN-LOH events involving MPL. For all 16 events for which we could 


test on n= 22 autosomes. d, Fractions of individuals with at least one 
detected autosomal event increase steadily with age, and this trend is 

even more pronounced for X chromosome events in females. Error 

bars, 95% CL. e, Carriers of different mCA types have different age and 
sex distributions. Error bars, s.e.m. f, Different mCAs are significantly 
enriched (FDR 0.05) among individuals with anomalous blood counts 

in different blood lineages (adjusted for age, sex, and smoking status; 

see Methods). Numeric data including exact sample sizes used to compute 
error bars are provided in Supplementary Tables 1-6. 


confidently phase the inherited risk allele relative to the somatic CNN- 
LOH, the CNN-LOH mutation had replaced the clonal haematopoiesis 
risk allele with the reference allele (binomial P=3 x 107°; Table 1 
and Fig. 4a). These results suggest that, among individuals with rare 
inherited variants that reduce MPL function, recovery of normal MPL 
gene activity via CNN-LOH provides a proliferative advantage. 
CNN-LOH events on chr11q associated (P=7.4 x 107°, OR=41 
(18-94)) with a rare risk haplotype (allele frequency = 0.07%) 


Table 1 | Novel genome-wide significant associations of mCAs with inherited variants 


GWAS Risk allelic shift in hets 
SV type Locus Variant Location _Alleles® RAF? P OR (95% Cl) Ninc®  Naecd P 
cis associations 
10q loss FRA10B rs118137427° 10q25.2 A/G 0.05 6.1 x 10-42 8 (12-26) ) 43 2.3x 10-8 
1p CNN-LOH MPL rs144279563 1p34.1 C/T 0.0005 6.2x10-16 53 (28-99) ) 9 3.9x 10-3 
rs182971382 1p34.1 A/G 0.0003 3.0x10-!4 63 (29-139) ) 4 1.3x1071 
rs369156948f = 134.2 C/T 0.0001 7.3x10-8 03 (35-300) ) 3 2.5x 107! 
11q CNN-LOH ATM rs532198118 11q22.3 A/G 0.0007. 7.4x10°9 41 (18-94) 6 e) 3.1x 10-4 
15q CNN-LOH and loss 7TM2D3, TARSL2 7O kb deletion? 15q26.3 CN=1/0 0.0003 1.3x10-86 698 (442-1102) 39 2 7.8 x 10-10 
chrX loss DXZ1 132942875 Xp11.1 T/C 0.55 9.7x 10-4 .09 (1.04-1.15) 423 796 66x 10-27 
DXZ4 rs11091036 Xq23 C/G 0.73 1.1x 10-3 .10(1.04-1.17) 369 555 1.0x 10-9 
trans associations 
chrX loss SP140L rs725201 2q37.1 G/T 0.56 9.2 x 10-10 17 (1.12-1.24) 
HLA rs141806003 6p21.33 C/CAAAG 0.34 6.1 x 10-19 .18 (1.12-1.25) 


Results of two independent association tests are reported: a Fisher test treating individuals with a given mCA type as cases; and (for cis associations) a binomial test for biased allelic imbalance in 
heterozygous cases (hets; see Methods). All loci reaching P< 1 x 10-8 in either test are reported; each cis association detected by one test reached nominal (P< 0.05) significance in the other test. At 
significant loci, the lead associated variant as well as additional independent associations reaching P< 1 x 10-® are reported. 


@Risk-lowering/risk-increasing 


allele. 


Risk allele frequency (in UK Biobank participants with European ancestry). 
Number of mosaic individuals heterozygous for the variant in which the somatic event shifted the allelic balance in favour of the risk allele (by duplication of its chromosomal segment and/or loss of 


the homologous segment). 


‘Number of mosaic individuals heterozygous for the variant in which the somatic event shifted the allelic balance in favour of the non-risk allele. 
°rs118137427 tags expanded repeats at FRA10B (Fig. 3). 
frs369156948 is a nonsense mutation in MPL. 
®This deletion spans chr15:102.15-102.22Mb (hg19) and is tagged by rs182643535. 
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Fig. 3 | Repeat expansions at fragile site FRA10B driving breakage at 
10q25.2. a, Germline variants at 10q25.2 associate strongly with terminal 
10q mosaic deletion (Fisher’s exact test, n = 120,664 individuals). Left 
boundaries of the deletions are called with error; true breakpoints are 
probably near-identical (Supplementary Note 4). b, UK Biobank carriers 
of terminal 10q deletion are predominantly female (top; 51 of n=60 
individuals; error bars, 95% CI) with age distribution similar to the overall 
study population (bottom; violin plot centres, means; error bars, 95% CI). 
c, WGS samples with terminal 10q deletion (two parent-child duos; right) 
carry inherited expanded repeats at FRA10B. 


surrounding the ATM gene at 11q22.3 (Table 1, Fig. 4b, and 
Supplementary Table 8). For all six CNN-LOH events for which we 
could confidently phase the risk allele relative to the somatic mutation, 
the LOH mutation had caused the rare risk allele to become homozy- 
gous, suggesting that the risk allele confers a proliferative advantage in 
the homozygous state (Table 1 and Fig. 4b). (This dynamic contrasts 
with that of MPL, at which the rare, inherited risk haplotypes were 
eliminated by LOH and clonal selection.) While sequencing would be 
required to identify a causal variant, ATM is a clear putative target: ATM 
encodes a DNA-damage response kinase that promotes DNA repair 
and limits cell division, and ATM is often inactivated by mutation in 
cancers***?, In our analysis, acquired 11q deletions also appeared to 
target ATM (Fig. 1 and Supplementary Note 2). 

CNN-LOH and loss events at chr15q associated strongly 
(P= 1.3 x 107*) with a rare, inherited 70-kb deletion (allele fre- 
quency = 0.03%) that spanned all of TM2D3 and part of TARSL2 
at 15q26.3 (Table 1, Fig. 4c, and Extended Data Figs. 6, 7). For 39 
of 41 events with high-confidence phase calls, the CNN-LOH or 
loss was inferred to produce homozygosity or hemizygosity of the 
inherited deletion, removing the reference (non-deletion) allele from 
the genome. (This dynamic resembles that of ATM in suggesting 
clonal selection for the rare, inherited risk allele.) The 70-kb dele- 
tion increased risk of 15q mosaicism by a factor of 698 (442-1,102): 
45 of 89 carriers exhibited detectable 15q events (32 CNN-LOH, 2 
loss, 11 ambiguous between CNN-LOH and loss). Notably, the 70-kb 
deletion was sometimes inherited on an allele that also had an inde- 
pendent 290-kb duplication of the locus (Extended Data Fig. 6); on 
this more complex allele, TM2D3 and TARSL2 gene dosage were nor- 
mal. Carriers of the more complex allele did not exhibit predisposition 
to mCAs. Further study will be required to determine a proliferative 
mechanism involving TM2D3, TARSL2, or noncoding elements within 
the region. 

The high penetrances (up to 50%) for the above cis associations led 
us to suspect that some risk-allele carriers might harbour multiple 
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Fig. 4 | Novel loci associated with mCAs in cis due to clonal selection. 

a, MPL. b, ATM. c, TM2D3-TARSL2Z. In each locus, one or more 

inherited genetic variant predisposes chromosomal mutations to create a 
proliferative advantage. Bottom, genomic modifications; top, association 
P values (Fisher’s exact test, n = 120,664 individuals). Independent lead 
associated variants are labelled, and variants are coloured according to 
linkage disequilibrium (LD) with lead variants (in shades of red, gold, or 
green; variants in grey are not in LD with lead variants). In c, the differing 
arrow weights to CNN-LOH and loss events indicate that CNN-LOH is the 
more common scenario (both in the population and among carriers of the 
risk variant). 


subclonal cell populations with the associated alterations. Using a 
modified version of our methodology, we detected 39 individuals 
who had acquired two or more CNN-LOH mutations (with differ- 
ent breakpoints and allelic fractions) involving the same chromo- 
some (Extended Data Fig. 8 and Supplementary Note 1). For all 39 
individuals with multiple same-chromosome CNN-LOH events, all 
events involved recurrent selection of the same haplotype (in different 
clones). Of these 39 haplotypes, 16 carried a risk allele identified by 
our association scans, 13 appeared to involve other (undiscovered) 
alleles at the same loci, 5 duplicated 13q14 deletions, and 5 involved 
other genomic loci (Extended Data Fig. 8). This result indicates 
strong proliferative advantage conferred by CNN-LOH in these indi- 
viduals and suggests that mitotic recombination occurs sufficiently 
frequently to yield multiple opportunities for clonal selection in indi- 
viduals carrying inherited haplotypes with different proclivities for 
proliferation. 

We also found two common variants on chromosome X that 
weakly increase risk of X loss while strongly influencing (in heterozy- 
gous females) which X chromosome is lost in the expanded clone. 
These involved a strong association (P=6.6 x 10-27, 1.9:1 bias in 
the lost haplotype) at Xp11.1 near DXZ1 and a weaker association 
(P=1.0 x 107%, 1.5:1 bias in the lost haplotype) at Xq23 near DXZ4 
(Table 1, Supplementary Table 9, and Supplementary Note 9). These 
associations do not appear to be explained by biased X chromosome 
inactivation*®® (Supplementary Table 10) and hint at yet another 
mechanism, different from those we have described. 


Trans associations with mCAs 

Genetic variants near genes involved in cell proliferation and cell cycle 
regulation predispose for male loss of Y'®?!, and female loss of X is 
also heritable (h? = 26% (17.4-36.2%) in sib-pair analysis)’, but no 
associations for X loss have previously been reported, to our knowledge. 
We confirmed the heritability of female X loss by performing BOLT- 
REML”“ analysis (see Methods), obtaining a SNP-heritability estimate 
of he = 10.6% (s.e. 3.6%). Genome-wide association analysis for trans 
variants influencing X loss further revealed two genome-wide signifi- 
cant associations at the SP140L and HLA loci (Table 1). 
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Fig. 5 | Associations between mCAs and incident cancers and mortality. 
a, Multiple mCA types confer increased risk of incident blood cancers 
diagnosed >1 year after DNA collection in n = 109,819 individuals 

with normal blood counts at assessment (Cochran-Mantel-Haenszel 

test adjusting for age and sex; error bars, 95% CI). b, A logistic model 
including mosaic status for 13q and trisomy 12 events along with other 
risk factors achieves high out-of-sample prediction accuracy for incident 
CLL (n= 36 cases and 113,923 controls with no cancer history). Lym#, log 
lymphocyte count. c, Time to malignancy tracks inversely with clonal cell 
fraction in n = 46 individuals with detectable clonality (of any mCA) who 
were diagnosed with CLL after assessment (one-sided Pearson's test). 

d, Loss, gain, and CNN-LOH events (on any autosome) all confer 
increased mortality risk in n = 128,854 individuals with no cancer history 
and n= 15,782 with prevalent cancers (error bars, 95% CI). Sample 
exclusions are detailed in the Methods. Numeric data are provided in 
Supplementary Tables 12 and 13. 


Germline variants affecting cancer risk or chromosome-maintenance 
phenotypes could in principle increase the risk of clonal expansions. 
We considered 86 variants that have been implicated in previous 
genome-wide association studies (GWAS) on CLL, MPN, Y loss, 
clonal haematopoiesis, and telomere length and tested these variants 
for trans association with seven classes of mCAs, stratifying events by 
copy number and by autosome versus X chromosome. Four variants 
reached Bonferroni significance (P < 8.3 x 10°): two linked variants 
in TERT!!°*! a rare frameshift mutation in CHEK2”°, and a low- 
frequency 3’ untranslated region (UTR) SNP in TP53*)? 
(Supplementary Table 11). The TERT and CHEK2 variants associated 
with multiple types of autosomal event; by contrast, the TP53 SNP 
primarily associated with losses (both focal autosomal deletions and 
X losses). Carriers of the CHEK2 frameshift mutation were especially 
prone to developing multiple mCAs (one-sided binomial P= 0.008): 
8 of 33 carriers with detected autosomal mosaicism had two or more 
mCAs, generally in multiple clones. 


Mosaic chromosomal alterations and subsequent health 

Cancer-free individuals with detectable mosaicism (at any locus) have a 
more than tenfold elevated risk of subsequent haematological cancer’~*. 
For CLL, a slowly progressing cancer that is known to be preceded 
by clonal mosaicism years before progression**“, mosaic alterations 
observed in patients who go on to develop CLL occur at the same loci 
as those observed in patients with CLL**3>"°. Using data on health 
outcomes for UK Biobank participants 4-9 years (median 5.7 years) 
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after DNA sampling, we identified nine specific mCAs that were 
significantly associated (FDR 0.05) with subsequent haematological 
cancer diagnoses (more than 1 year after DNA collection) in analyses 
corrected for age and sex and restricted to individuals with normal 
blood counts at assessment (Fig. 5a and Supplementary Table 12), con- 
firming and providing additional resolution to previous findings!”. 
A logistic model combining mosaic status for CLL-associated events 
with other risk factors—age, sex, CLL genetic risk score*’”, and lym- 
phocyte count—achieved high CLL prediction accuracy (area under 
the curve (AUC) =0.81) in tenfold cross-validation (Fig. 5b and 
Extended Data Fig. 9). Most of this predictive power came from early 
clones with trisomy 12, which we could detect at very low cell fractions 
(Extended Data Figs. 9, 10). Individuals with incident CLL exhibited 
clonality up to six years before diagnosis, and clonal fraction inversely 
correlated with time to malignancy (Fig. 5c). We further observed 
that detectable mosaicism roughly doubled risk for all-cause mortality 
(corrected for age, sex, and smoking status). This association was 
explained only partly by cancer deaths (Fig. 5d and Supplementary 
Table 13) and could reflect effects on cardiovascular illness!?, although 
further study is needed to explore this finding and rule out residual 
confounding. 


Discussion 

Mosaicism typically results from mutation followed by selective pro- 
liferation!®, and our results uncover diverse biological mechanisms 
underlying this transformation. We identified very rare inherited var- 
iants that affect either the likelihood of mutation (at FRA10B) or its 
proliferative impacts (due to CNN-LOH in cis), and we also observed 
trans influences on clonal haematopoiesis in the cell cycle genes TP53, 
CHEK2, and TERT. Our findings of cis risk loci for CNN-LOH expan- 
sions are particularly noteworthy: while some CNN-LOH expansions 
have previously been observed to provide a second hit to a frequently 
mutated locus’ or to disrupt imprinting”, here we observed that CNN- 
LOHs can also achieve strong selective advantage by duplicating or 
removing inherited alleles. The high penetrances (up to 50%) of the 
inherited CNN-LOH risk variants we identified challenge what is usu- 
ally seen as a fundamental distinction between inherited alleles and 
(more capricious) acquired mutations. A large fraction of carriers of 
the inherited alleles subsequently acquire and then clonally amplify 
the mutations in question. The high penetrances imply that mitotic 
recombination is sufficiently common to predictably unleash latent, 
inherited opportunities for clonal selection of homozygous cells during 
the lifespan of an individual, corroborating a recent observation of this 
phenomenon in skin®”. Similarly, we observed Mendelian inheritance 
patterns for 10q breakage at FRA1OB, despite this event involving an 
acquired mutation. 

Clonal expansions exhibit varying levels of proliferation and biological 
transformation and thus have a spectrum of effects on health'®. We 
found that many mCAs, including some of those driven by cis-acting 
genetic variation, had no discernible adverse effects. However, mCAs 
commonly seen in blood cancers strongly increased cancer risk and 
could potentially be used for early detection—although we caution that 
these results are based on relatively short follow-up (4-9 years of can- 
cer outcomes) and need independent replication. As population-scale 
efforts to collect genotype data and health outcomes continue to 
expand—increasing both sample sizes and the power of population- 
based chromosomal phasing—we anticipate ever-more-powerful 
analyses of clonal haematopoiesis and its clinical sequelae. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

UK Biobank cohort and genotyping intensity data. The UK Biobank is a very 
large prospective study of individuals aged 40-70 years at assessment”. Participants 
attended assessment centres between 2006 and 2010, where they contributed blood 
samples for genotyping and blood analysis and answered questionnaires about 
medical history and environmental exposures. In the years since assessment, health 
outcome data for these individuals (for example, cancer diagnoses and deaths) have 
been accrued via UK national registries. 

We analysed genetic data from the UK Biobank interim release (about 30% of 
the full UK Biobank) consisting of 152,729 samples typed on the Affymetrix UK 
BiLEVE and UK Biobank Axiom arrays with about 800,000 SNPs each and more 
than 95% overlap. We removed 480 individuals marked for exclusion from genomic 
analyses based on missingness and heterozygosity filters and one individual who 
had withdrawn consent, leaving 152,248 samples. We restricted the variant set to 
biallelic variants with missingness < 10% and we further excluded 111 variants 
found to have significantly different allele frequencies between the UK BiLEVE 
array and the UK Biobank array, leaving 725,664 variants on autosomes and the X 
chromosome. Finally, we additionally excluded 118,139 variants for which fewer 
than 10 samples (or for chrX, fewer than 5 female samples) were called as homozy- 
gous for the minor allele; we observed that genotype calls at these variants were 
susceptible to errors in which rare homozgyotes were called as heterozygotes. We 
phased the remaining 607,525 variants using Eagle2™ with -Kpbwt = 40,000 and 
otherwise default parameters. 

We transformed genotyping intensities to log»R ratio (LRR) and B-allele fre- 

quency (BAF) values*! (which measure total and relative allelic intensities) after 
affine-normalization and GC wave-correction® in a manner similar to that 
described’ (Supplementary Note 1). For each sample, we then computed s.d. (BAF) 
among heterozygous sites within each autosome, and we removed 320 samples with 
median s.d. (BAF) > 0.11 indicating low genotyping quality. Finally, we removed 
an additional 725 samples with evidence of possible contamination® (based on 
apparent short interstitial CNN-LOH events in regions of long-range linkage 
disequilibrium; Supplementary Note 1) and one sample without phenotype data, 
leaving 151,202 samples for analysis. 
Detection of mCAs using long-range haplotype phase. Here we outline the key 
ideas of our approach to mCA detection; full details are provided in Supplementary 
Note 1. The core intuition is to harness long-range phase information to search for 
local imbalances between maternal and paternal allelic fractions in a cell popula- 
tion (Extended Data Fig. 1). The utility of haplotype phase for this purpose has pre- 
viously been recognized*°*, but previous approaches have needed to account for 
phase switch errors occurring roughly every megabase, a general challenge faced by 
haplotype-based analyses*. In the UK Biobank, we have phase information accu- 
rate at the scale of tens of megabases*4, enabling a new modelling approach and 
considerable gains in sensitivity for detection of large events at low cell fractions 
(Supplementary Note 5). (Because our method is phase-based, it has the limitation 
that it cannot detect events contained within regions of homozygosity. While this 
issue is minor in our study of large events, other approaches originally developed 
for detection of shorter constitutional or high-cell-fraction CNVs are not subject 
to this limitation®**”.) 

Our technique employs a three-state hidden Markov model (HMM) to capture 
mCA-induced deviations in allelic balance (| ABAF)) at heterozygous sites. (By con- 
trast, the hapLOH method** tabulates ‘switch consistency’ between consecutive het- 
erozygous sites.) Our model has a single parameter O, which represents the expected 
absolute BAF deviation at germline hets within an mCA. In computationally 
phased genotyping intensity data, multiplying phase calls with (signed) BAF devi- 
ations produces contiguous regions within the mCA in which the expected phased 
BAF deviation is either +O or —O (with sign flips at phase switch errors); outside 
the mCA, no BAF deviation is expected. The three states of our HMM encode these 
three possibilities, and emissions from the states represent noisy BAF measure- 
ments. Transitions between the +O and —9O states represent switch errors, while 
transitions between + O and the 0 state capture mCA boundaries. 

Modelling observed phased BAF deviations using a parameterized HMM has 
the key benefit of naturally producing a likelihood ratio test statistic for determin- 
ing whether a chromosome contains a mCA. Explicitly, for a given choice of O, 
we can compute the total probability of the observed BAF data under the assump- 
tion that mCA-induced BAF deviations have E[| ABAF|] = 0, using standard 
HMM dynamic programming computations to integrate over uncertainty in phase 
switches and mCA boundaries. Taking the ratio of the maximum likelihood over 
all possible choices of © to the likelihood for © = 0 (that is, no mCA) yields a 
test statistic. If the HMM perfectly represented the data, this test statistic could 
be compared to an asymptotic distribution. However, we know in practice that 
parameters within the HMM (for example, transition probabilities) are imperfectly 


estimated, so we instead calibrated our test statistic empirically: we estimated its 
null distribution by computing test statistics on data with randomized phase, and 
we used this empirical null to control FDR. Finally, for chromosomes passing the 
FDR threshold, we called mCA boundaries by sampling state paths from the HMM 
(using the maximum likelihood value of ©). 

The above detection procedure uses only BAF data and ignores LRR meas- 
urements by design (to be maximally robust to genotyping artefacts); however, 
after detecting events, we incorporated LRR data to call detected mCAs as loss, 
CNN-LOH, or gain. All mosaic chromosomal alterations cause BAF (measuring 
relative allelic intensity) to deviate from 0.5 at heterozygous sites, and losses and 
gains cause LRR (measuring total intensity) to deviate from 0, with deviations 
increasing with clonal cell fraction; accordingly, we observed that plotting detected 
events by LRR and BAF deviation produced three linear clusters (Fig. 2a), consist- 
ent with previous work!*, We called copy number using chromosome-specific 
clusters to take advantage of the differing frequencies of event types on different 
chromosomes. Because the clusters converge as BAF deviation approaches zero, 
we left copy number uncalled for detected mCAs at low cell fraction (with <95% 
confident copy number), comprising 29% of all detected mCAs. We then estimated 
clonal cell fractions as described!. 

As a post-processing step to exclude possible constitutional duplications, 

we filtered events of length >10 Mb with LRR >0.35 or with LRR >0.2 and 
|ABAF|>0.16, and we filtered events of length <10 Mb with LRR >0.2 or with 
LRR >0.1 and |ABAF|>0.1. We chose these thresholds conservatively based on 
visual inspection of LRR and BAF distributions, in which likely constitutional 
duplications formed well-defined clusters (Supplementary Note 1). (Most con- 
stitutional duplications were already masked in a pre-processing step involving a 
separate HMM described in Supplementary Note 1.) 
Enrichment of mCA types in blood lineages. We analysed 14 blood count indices 
(counts and percentages of lymphocytes, basophils, monocytes, neutrophils, red 
cells, and platelets, as well as distribution widths of red cells and platelets) from 
complete blood count data available for 97% of participants. We restricted the anal- 
ysis to individuals of self-reported European ancestry (96% of the cohort), leaving 
140,250 individuals; we then stratified by sex and quantile-normalized each blood 
index after regressing out age, age squared, and smoking status. 

To identify classes of mCAs linked to different blood cell types, we first classified 
mCAs based on chromosomal location and copy number. For each autosome, we 
defined five disjoint categories of mCAs that comprised the majority of detected 
events: loss on p arm, loss on q arm, CNN-LOH on p arm, CNN-LOH on q arm, 
and gain. We subdivided loss and CNN-LOH events by arm but did not subdivide 
gain events because most gain events are whole-chromosome trisomies (Fig. 1). 
For chromosome X, we replaced the two loss categories with a single whole-chro- 
mosome loss category. Altogether, this classification resulted in 114 mCA types. 
We restricted our blood cell enrichment analyses to 78 mCA types with at least 10 
occurrences, and we further excluded the chr17 gain category (because nearly all 
of these events arise from i(17q) isochromosomes already counted as 17p- events; 
Supplementary Note 2). 

For each of the 77 remaining mCA types, we computed enrichment of mCAs 

among individuals with anomalous (top 1%) values of each normalized blood 
index using Fisher’s exact test (two-sided; P values reported throughout this man- 
uscript are from two-sided statistical tests unless explicitly stated otherwise). We 
reported significant enrichments passing an FDR threshold of 0.05 (Fig. 2f and 
Supplementary Table 6). 
Chromosome-wide association tests for cis associations with mCAs. To identify 
inherited variants influencing nearby mCAs, we performed two types of asso- 
ciation analysis. First, we searched for variants that increased the probability of 
developing nearby mCAs. For each variant, we performed a Fisher test for associ- 
ation between the variant and up to three variant-specific case-control phenotypes, 
defined by considering samples to be cases if they contained loss, CNN-LOH, or 
gain events containing the variant or within 4 Mb (to allow for uncertainty in 
event boundaries). We tested phenotypes with at least 25 cases; in total, 48 out of 
69 =23 x 3 possible event types had at least 25 carriers, and the rest were excluded 
from association analyses. We performed these tests on 51 million imputed variants 
with minor allele frequency (MAF) > 2 x 10-° (imputed by UK Biobank using 
merged UK10K and 1000 Genomes Phase 3 reference panels**), excluding vari- 
ants with non-European MAF greater than five times their European MAE, which 
tended to be poorly imputed. We analysed 120,664 individuals who remained 
after restricting to individuals of self-reported British or Irish ancestry, removing 
principal component outliers (>4 s.d.), and imposing a relatedness cutoff of 0.05 
(using plink --rel-cutoff 0.05)*?. (In our non-GWAS analyses, which focused on 
mosaic individuals, we did not apply any special handling of related individuals as 
the number of related pairs was very small: for example, only 11 third-degree or 
closer relationships among 4,889 individuals with autosomal mosaicism.) 

We also ran a second form of association analysis searching for variants 
for which mCAs tended to shift allelic balance (analogous to allele-specific 
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expression). For a given class of mCAs, for each variant, we examined heterozygous 
mosaic individuals for which the mCA overlapped the variant, and we performed a 
binomial test to check whether the mCA was more likely to delete or duplicate one 
allele rather than the other. We restricted the binomial test to individuals in which 
the variant was confidently phased relative to the mCA (that is, no disagreement 
in five random resamples from the HMM used to call the mCA). 

Given that the two association tests described above are independent, we applied 
a two-stage approach to identify robust genome-wide significant associations. We 
used a P value threshold of 10~* for discovery in either test and then checked for 
nominal P < 0.05 significance in the other test (reasoning that variants that influ- 
enced mCAs would exhibit both types of association). At all loci with P< 10~$ for 
either test, the most significant variant with P< 10~* in one test reached nominal 
significance in the other (Table 1). At identified loci, we further searched for 
secondary independent associations reaching P< 10~°. 

In our final analyses, we refined mCA phenotypes to slightly increase power to 
map associations. For the loci associated with 1p, 9p, and 15q CNN-LOH, we found 
that association strength improved by expanding case status to include all events 
reaching the telomere (because several detected telomeric events with uncertain 
copy number were probably actually CNN-LOH events associated with the same 
germline variants). For the association signal at FRA10B, we refined case status to 
only include terminal loss events extending from 10q25 to the telomere (because 
of the breakpoint specificity of this event). We verified that all association tests 
produced well-calibrated test statistics (Supplementary Note 3). 
Identity-by-descent analysis at MPL and FRA10B. At loci for which we found 
evidence of multiple causal rare variants, we searched for long haplotypes shared 
identical-by-descent among mCA carriers to further explore the possibility of 
additional or recurrent causal variants. We called IBD tracts using GERMLINE 
with haplotype extension®. 

Simons Simplex Collection WGS data set. The Simons Simplex Collection (SSC) 
is a repository of genetic samples from autism simplex families collected by the 
Simons Foundation Autism Research Initiative (SFARI)°. We analysed 2,079 
whole-genome sequences from the first phase of SSC sequencing (median coverage 
37.8x)°” to examine whether mCAs we detected contributed to genetic risk of 
autism. (The main data set consisted of 2,076 individuals in 519 quartets; we addi- 
tionally analysed three individuals that did not belong to a complete quartet but 
were of interest based on high read counts at FRA10B.) 

Detection and calling of 70-kb deletion at 15q26.3. We discovered the inherited 
70-kb deletion associated with 15q CNN-LOH and loss by mapping the 15q26.3 
association signal (specifically, the rs182643535 tag SNP) in WGS data (Fig. 4c and 
Extended Data Fig. 6). We then called this deletion in the UK Biobank SNP-array 
data using genotyping intensities at 24 probes in the deleted region (Extended 
Data Fig. 7). 

Detection and imputation of VNTRs at FRA10B. For all WGS samples with 10 
or more reads at the FRA10B locus, we attempted to perform local assembly of 
the reads and identify a primary VNTR motif in the assembly. We identified 12 
distinct primary motifs carried by 26 individuals in 14 families (Extended Data 
Fig. 5a, b and Supplementary Note 8). Owing to read dropout in many samples, 
it is possible that these VNTR motifs may be found in additional samples, 
and that other VNTR motifs may not have been detected. We imputed the 
VNTR sequences into UK Biobank using Minimac3". Full details are provided 
in Supplementary Note 8. 

GWAS and heritability estimation for trans drivers of clonality. We tested 
variants with MAF > 1% for trans associations with six classes of mCAs (any event, 
any loss, any CNN-LOH, any gain, any autosomal event, any autosomal loss) on 
120,664 unrelated individuals with European ancestry (described above) using 
BOLT-LMM®, including 10 principal components, age, and genotyping array as 
covariates. We also tested association with female X loss using an expanded set of 
3,462 likely X loss calls at an FDR of 0.1, restricting this analysis to 66,685 female 
individuals. In our targeted analysis of 86 variants implicated in previous GWAS, 
we applied a Bonferroni significance threshold of 8.3 x 10~° based on 86 variants 
and 7 phenotypes. We estimated SNP heritability of X loss using BOLT-REML”, 
transforming estimates to the liability scale®. 

Analysis of X chromosome inactivation in GEUVADIS RNA sequencing data. 
To test for possible mediation of preferential X haplotype loss by biased X chro- 
mosome inactivation (XCI), we examined GEUVADIS RNA sequencing (RNA- 
seq) data for evidence of biased XCI near the primary biased loss association at 
Xp11.1. We identified three coding SNPs in FAAH2 within the pericentromeric 
linkage disequilibrium block containing the association signal. We analysed RNA- 
seq data for 61 European-ancestry individuals who were heterozygous for at least 
one SNP (60 of 61 were heterozygous for all three SNPs, and the remaining indi- 
vidual was heterozygous at two of the SNPs). We used GATK ASEReadCounter® 
to identify allele-specific expression from RNA-seq BAM files. Most individuals 
displayed strong consistent allele-specific expression across the three SNPs, as 
expected for XCI in clonal lymphoblastoid cell lines*?; however, we observed 
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no evidence of systematically biased XCI in favour of one allele or the other 
(Supplementary Table 10). 

UK Biobank cancer phenotypes. We analysed UK cancer registry data provided 
by UK Biobank for 23,901 individuals with one or more prevalent or incident 
cancer diagnoses. Cancer registry data included date of diagnosis and ICD-O-3 
histology and behaviour codes, which we used to identify individuals with diag- 
noses of CLL, MPN, or any blood cancer®*”, Because our focus was on prognostic 
power of mCAs for predicting diagnoses of incident cancers more than one year 
after DNA collection, we excluded all individuals with cancers reported prior to 
this time (either from cancer registry data or self-report of prevalent cancers). 
We also restricted our attention to the first diagnosis of cancer in each individual, 
and we censored diagnoses after 30 September 2014, as suggested by UK Biobank 
(resulting in a median follow-up time of 5.7 years, s.d. 0.8 years, range 4-9 years). 
Finally, we restricted analyses to individuals with self-reported European ancestry. 
These exclusions reduced the total counts of incident cases to 78 (CLL), 42 (MPN), 
and 441 (any blood cancer), which we analysed with 119,330 controls. In our pri- 
mary analyses, we further eliminated individuals with any evidence of potential 
undiagnosed blood cancer based on anomalous blood counts (lymphocyte count 
outside the normal range of 1-3.5 x 10°/l, red cell count >6.1 x 10!7/l for males or 
>5.4 x 10!/] for females, platelet count >450 x 10°/1, red cell distribution width 
>15%), leaving incident case counts of 36 (CLL), 23 (MPN), and 327 (any blood 
cancer). 

Estimation of cancer risk conferred by mCAs. To identify classes of mCAs asso- 
ciated with incident cancer diagnoses, we classified mCAs based on chromosomal 
location and copy number into the 114 classes described above. We then restricted 
our attention to the 45 classes with at least 30 carriers (to reduce our multiple 
hypothesis burden, given that we would be underpowered to detect associations 
with the rarer events). For each mCA class, we considered a sample to be a case 
if it contained only the mCA or if the mCA had the highest cell fraction among 
all mCAs detected in the sample (that is, we did not count carriers of subclonal 
events as cases). We computed odds ratios and P values for association between 
mCA classes and incident cancers using Cochran—Mantel—Haenszel (CMH) tests 
to stratify by sex and by age (in six 5-year bins). We used the CMH test to compute 
odds ratios (for incident cancer any time during follow-up) rather than using a Cox 
proportional hazards model to compute hazard ratios because both the mCA phe- 
notypes and the incident cancer phenotypes were rare, violating normal approx- 
imations underlying regression. We reported significant associations passing an 
FDR threshold of 0.05 (Fig. 5a and Supplementary Table 12). 

Prediction of incident CLL. We considered four nested logistic models for predic- 
tion of incident CLL. In the first model, a baseline, we included only age and sex as 
explanatory variables. In the second model, we added CLL genetic risk (computed 
using 14 high-confidence GWAS hits that had both been previously published*” 
and reached P<5 x 10~*). In the third model, we added log lymphocyte count. In 
the full model, we added explanatory variables for 13q and +12 events. 

We assessed the accuracy of each model on two benchmark sets of samples. We 
restricted our primary analyses to individuals with normal lymphocyte counts 
(1-3.5 x 10°/I) at assessment (that is, exhibiting at most slight clonality); in auxil- 
iary analyses, we removed this restriction (and expanded the full prediction model 
to include 11q-, +12, 13q-, 13q CNN-LOH, 14q-, 22q-, and the total number 
of other autosomal events). We performed tenfold stratified cross-validation to 
compare model performance. We assessed prediction accuracy by merging results 
from all cross-validation folds and computing area under the receiver operating 
characteristic curve (AUC) (Fig. 5b), and we also measured precision-recall perfor- 
mance (Extended Data Fig. 9). (We caution that while AUC is commonly used to 
assess discriminative power, AUC does not have a direct clinical interpretation. ) 
Estimation of mortality risk conferred by mCAs. We analysed UK death registry 
data provided by UK Biobank for 4,619 individuals reported to have died since 
assessment. We censored deaths after 31 December 2015, as suggested by UK 
Biobank, leaving 4,518 reported deaths over a median follow-up time of 6.9 years 
(range 5-10 years). We examined the relationship between mCAs and mortal- 
ity, aiming to extend previous observations that mosaic point mutations increase 
mortality risk>*"!. For this analysis, we were insufficiently powered to stratify 
mCAs by chromosome owing to the weaker effects of mCAs on mortality risk 
and the relatively small number of deaths reported during follow-up. We therefore 
stratified mCAs only by copy number and computed the hazard ratio conferred 
by each event class using a Cox proportional hazards model. We restricted these 
analyses to individuals with self-reported European ancestry, and we adjusted for 
age and sex as well as smoking status, which was previously associated with clonal 
haematopoiesis*!) and associates with mosaicism in the UK Biobank 
(P =0.00017). We did not exclude individuals based on blood counts in these 
analyses (or in our time-to-malignancy versus clonal fraction analyses), hence the 
larger sample sizes in Fig. 5c, d than in Fig. 5a, b. 

Code availability. Code used to perform the analyses in this study is available from 
the corresponding authors upon request. 
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Data availability. Mosaic event calls are available in the Supplementary Data. 
Access to the UK Biobank Resource is available via application (http://www. 
ukbiobank.ac.uk/). Approved researchers can obtain the SSC population data set 
described in this study by applying at https://base.sfari.org. 
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Extended Data Fig. 1 | Examples of mosaic events called using phased 
genotyping intensities. a—~c, UK Biobank mCA sample 2791 has a mosaic 
deletion of chr13 from approximately 31-53 Mb that cannot be confidently 
called from unphased BAF and LRR data (a, c). However, the existence of 
an event is evident in the phased BAF data (b), and the regional decrease 
in LRR indicates that this event is a deletion. In b, mean phased BAF is 
plotted for SNPs aggregated into bins spanning n = 25 heterozygous sites; 
the same bins are used for c. Error bars, s.e.m. d-f, Sample 1645 has a 
mosaic CNN-LOH on chr9p from the 9p telomere to about 26 Mb that 
cannot be confidently called from unphased BAF data (d) but is evident 
in phased BAF data (e). A phase switch error causes a sign flip in phased 
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BAF at approximately 20 Mb. The lack of a shift in LRR in the region (f) 
indicates that this event is a CNN-LOH. In e, mean phased BAF is plotted 
for SNPs aggregated into bins spanning n = 50 heterozygous sites; the 
same bins are used for f. Error bars, s.e.m. g-i, Sample 2464 has a full- 
chromosome mosaic event on chr12 that cannot be confidently called 
from unphased BAF and LRR data (g, i) but is evident in phased BAF 
data (h). Several phase switch errors cause sign flips in phased BAF across 
chr12. The slight positive shift in mean LRR (i) indicates that this event is 
most likely to be a mosaic gain of chr12. In h, mean phased BAF is plotted 


for SNPs aggregated into bins spanning n = 50 heterozygous sites; the same 
bins are used for i. Error bars, s.e.m. 
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0.57 
—©— High-confidence mosaic calls: 6545 calls passing FDR 0.01 threshold 
——©— Medium-confidence mosaic calls: 1797 calls between FDR 0.05 and 0.01 thresholds 
0.45- —S— Low-confidence mosaic calls: 1910 calls between FDR 0.10 and 0.05 thresholds 
——S— All samples (baseline) (permutation—based FDR thresholds) 
DA sites Expected medium-confidence distribution (based on FDRs from phase randomization): 
0.05 * 8342 — 0.01 * 6545 = 352 false positives among 1797 calls 
0.35+ => 20:80 mix of false positives (baseline) and true positives (high conf.) 
Observed regression fit: 
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Extended Data Fig. 2 | Estimation of true FDR using age distributions events are expected to be false positives. To estimate our true FDR, we 
of individuals with mCA calls. We generated age distributions for (i) regressed the medium-confidence age distribution on the high-confidence 
‘high-confidence’ detected events passing a permutation-based FDR and overall age distributions, reasoning that the medium-confidence 
threshold of 0.01 (bright red); (ii) ‘medium-confidence’ events below the age distribution should be a mixture of correctly called events with age 
FDR threshold of 0.01 but passing an FDR threshold of 0.05 (darker red); distribution similar to that of the high-confidence events, and spurious 
and (iii) ‘low-confidence’ events below the FDR threshold of 0.05 but calls with age distribution similar to the overall cohort. We observed a 
passing an FDR threshold of 0.10 (darkest red; not analysed but plotted for —_ regression weight of 0.31 for the component corresponding to spurious 
context). We compared these distributions to the overall age distribution calls, in good agreement with expectation, and implying a true FDR of 


of UK Biobank participants (grey). On the basis of the numbers of events 7.5% (6.2-8.8%, 95% CI based on regression fit on n = 6 age bins). 
in each category, approximately 20% of medium-confidence detected 
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Extended Data Fig. 3 | Clonal cell fractions of co-occurring events population. A few exceptions do seem to exist; for example, 22q- versus 
generally suggest co-existence within the same cell population. For 13q CNN-LOH cell fraction; here, the cell fractions suggest that 13q CNN- 
each pair of significantly co-occurring events (Fig. 2b), we compared the LOH events may be present in a subclone. This observation is consistent 
clonal fractions of the two events within each individual that carried both with acquired uniparental disomy of 13q providing a second hit within a 
events. Each point in the plots corresponds to an individual carrying the del(13q14) clonal expansion, as we see in Extended Data Fig. 8. (We did 
pair of events under consideration; individuals are colour-coded by the not include del(13q14) vs. 13q CNN-LOH in this plot because inference 
total number of events they carry. For nearly all pairs of events, the clonal of clonal fractions is complex for these overlapping events; see Extended 
fractions of the two events were very similar in most individuals carrying Data Fig. 8.) 


both events, suggesting that the events occurred in the same clonal cell 
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Extended Data Fig. 4 | Replication of previous association between 
JAK2 46/1 haplotype and 9p CNN-LOH in cis due to clonal selection. 
The common JAK2 46/1 haplotype has previously been shown to confer 
risk of somatic JAK2 V617F mutation such that subsequent 9p CNN-LOH 
produces a strong proliferative advantage’ '*° (right). In our analysis, 
CNN-LOH on 9p is strongly associated with JAK2 46/1 (P=1.6 x 107}, 
OR = 2.7 (2.1-3.5); Fisher’s exact test on n = 120,664 individuals) with 
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Identity-by-descent graph at FRA10B 
for UK Biobank del(10q) individuals 
colored according to imputed VNTRs 


a Variable Number Tandem Repeats (VNTRs) at FRA10B b 
identified in WGS data 
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c IBD graph at MPL for UK Biobank 1p mosaic individuals 


Edges = IBD>2.5cM (edge weights increase with IBD length) 

Red nodes = carriers of rare MPL nonsense mutation (rs369156948) 
Green nodes = carriers of long rare haplotype (tag: rs144279563) 
Blue nodes = carriers of long rare haplotype (tag: rs182971382) 
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Extended Data Fig. 5 | Evidence of multiple causal variants for 10q25.2 
breakage and 1p CNN-LOH associations. a, Multiple expanded repeats 
at FRA10B drive breakage at 10q25.2. We identified 12 distinct primary 
repeat motifs at FRA10B in 26 whole-genome-sequenced individuals 
from 14 families (labelled VNTR-N-x, where N denotes length in base 
pairs); carriers of these repeats exhibit varying degrees of FRA10B repeat 
expansion (Supplementary Note 8). The repeat motifs are AT-rich and are 
similar to FRA10B repeats previously reported*>. The alignment provided 
here includes the repeat motifs that were most frequently observed in 
FRA10B expanded alleles*> (E8, E13, E17, and E19) along with a few other 
closely related expanded repeat motifs (E10, E11, and E12). b, Carriers 


of the 10q terminal deletion in the UK Biobank share long haplotypes at 
10q25.2 identical-by-descent. Square nodes in the IBD graph correspond 
to males and circles to females. Node size is proportional to cell fraction 
and edge weight increases with IBD length. Coloured nodes indicate 
imputed carriers of variable number tandem repeats (VNTRs) at FRA10B 
(Supplementary Table 7); colour intensity scales with imputed dosage. 

c, Identity-by-descent graph at MPL locus (chr1:43.8 Mb) on individuals 
with mCAs on chr] extending to the p telomere. Colored nodes indicate 
imputed carriers of SNPs independently associated with mosaic 1p CNN- 
LOH (Fig. 4a). 
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Extended Data Fig. 6 | Germline CNVs at 15q26.3. a, Read depth profile has detectable mosaicism in two distinct 15q CNN-LOH subclones (one 
plot of WGS samples in the terminal 700 kb of chr15q. Three individuals starting at 41.64 Mb with 4.6% cell fraction, the other starting at 71.64 Mb 


in one family carry an approximately 70-kb deletion at 15q26.3, and a with an additional 2.0% cell fraction). b, Expanded read depth profile 
fourth carries the same deletion along with an approximately 290-kb plot, with deletion-only individuals highlighted in blue and the del + dup 
duplication (probably on the same haplotype, based on population individual highlighted in green. Breakpoint analysis indicates that the 
frequencies of these events; see Extended Data Fig. 7). These four deletion spans chr15:102151467-102222161 and contains a 1,139-bp 


individuals (highlighted in blue) segregate with the rs182643535:T allelein | mid-segment (chr15:102164897-102166035) that is retained in inverted 
the WGS cohort. Inset: the parental carrier in the family, individual 10921, _ orientation. The duplication spans chr15:102026997-102314016. 
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CNVs at 15q26.3. Using identified breakpoints of the germline 70-kb 
deletion and 290-kb duplication (Extended Data Fig. 6), we computed 
mean genotyping intensity (LRR) in UK Biobank samples within the 


70-kb deletion region (24 probes) and within the flanking 220-kb region 
(97 probes). Individuals are plotted by flanking 220-kb mean LRR versus 


70-kb mean LRR and coloured according to mosaic status for somatic 
15q mCAs. UK Biobank samples carrying the 70-kb deletion, 290-kb 


duplication, and both (del+dup) are all easily identifiable in distinct 
clusters. The plot also appears to contain clusters with higher copy 
number. Of the three CNV-carrying alleles, the simple 70-kb deletion is 
the only one that predisposes to mCAs. Most mosaic events containing 
the 70-kb deletion are CNN-LOH events that make cells homozygous for 
the 70-kb deletion; two individuals have somatic loss of the homologous 
(normal) chromosome, making cells hemizygous for the 70-kb deletion. 
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Extended Data Fig. 8 | Phased BAF plots of chromosomes with multiple switch errors. Two samples exhibit high switch error rates: 14q individual 
CNN-LOH subclones. All of the plots exhibit step functions of increasing 3067 (explained by non-European ancestry), and 1p individual 23 


|ABAF| towards a telomere, which is the hallmark of multiple clonal cell (explained by very high | ABAF|; extreme shifts in genotyping intensities 
populations containing distinct CNN-LOH events that affect different result in poor genotyping quality). All five individuals with multiple CNN- 
spans of a chromosomal arm (all extending to the telomere). Distinct LOH events on chr13q appear to contain switch errors over 13q14, but 
|ABAP| values (called using an HMM) are indicated with different these switches are actually explained by overlapping 13q14 deletions; see 
colours. Flips in the sign of phased BAF usually correspond to phase Supplementary Note 1 for detailed discussion. 
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Extended Data Fig. 9 | CLL prediction accuracy: receiver operating 
curves and precision-recall curves. CLL prediction benchmarks using 
tenfold stratified cross validation on: only individuals with lymphocyte 
counts in the normal range (1 x 10°/L to 3.5 x 10°/L), as in our primary 
analyses (n = 36 cases, 113,923 controls) (a, b); and individuals with any 
lymphocyte count (n= 78 cases, 118,481 controls) (c, d). a matches Fig. 5b, 
and b shows the precision-recall curve from the same analysis. c and d 
correspond to an analogous analysis in which we removed the restriction 
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on lymphocyte count and also used additional mosaic event variables for 
prediction (11q-, 14q-, 22q-, and total number of autosomal events). 

In both benchmarks, individuals with previous cancer diagnoses or CLL 
diagnoses within 1 year of assessment were excluded; however, some 
individuals with very high lymphocyte counts pass this filter (and probably 
already had CLL at assessment despite being undiagnosed for more than 

1 year), hence the difference in apparent prediction accuracy between the 
two benchmarks. 
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Extended Data Fig. 10 | Mosaic chromosomal alterations detected in 
CLL cases sorted by lymphocyte count. Individuals are stratified by 
cancer status at DNA collection (no previous diagnosis versus any previous 


diagnosis), and mCAs (red, loss; green, CNN-LOH; blue, gain; grey, 
undetermined) are plotted per chromosome as coloured rectangles (with 
height increasing with BAF deviation). 
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Single-cell analysis of early progenitor 
cells that build coronary arteries 


Tianying Su’, Geoff Stanley”, Rahul Sinha?”, Gaetano D’ Amato!, Soumya Das!, Siyeon Rhee!, Andrew H. Chang!, 
Aruna Poduri', Brian Raftrey', Thanh Theresa Dinh**, Walter A. Roper*°, Guang Li®, Kelsey E. Quinn’, Kathleen M. Caron’, 
Sean Wu>®, Lucile Miquerol’, Eugene C. Butcher*”, Irving Weissman*, Stephen Quake!®"! & Kristy Red-Horse!* 


Arteries and veins are specified by antagonistic transcriptional programs. However, during development and regeneration, 
new arteries can arise from pre-existing veins through a poorly understood process of cell fate conversion. Here, using 
single-cell RNA sequencing and mouse genetics, we show that vein cells of the developing heart undergo an early cell 
fate switch to create a pre-artery population that subsequently builds coronary arteries. Vein cells underwent a gradual 
and simultaneous switch from venous to arterial fate before a subset of cells crossed a transcriptional threshold into 
the pre-artery state. Before the onset of coronary blood flow, pre-artery cells appeared in the immature vessel plexus, 
expressed mature artery markers, and decreased cell cycling. The vein-specifying transcription factor COUP-TF2 (also 
known as NR2F2) prevented plexus cells from overcoming the pre-artery threshold by inducing cell cycle genes. Thus, 
vein- derived coronary arteries are built by pre-artery cells that can differentiate independently of blood flow upon the 
release of inhibition mediated by COUP-TF2 and cell cycle factors. 


The ability of cells to switch fates and acquire new identities is critical 
for organogenesis and regeneration, but the mechanisms that underlie 
cell fate conversions are poorly understood. The vasculature is a model 
for this process because it initially differentiates into arteries and veins 
whose transcriptional networks antagonize each other (Notch signal- 
ling maintains arteries while COUP-TF2 maintains veins!*). However, 
during development and regeneration, veins can become the source of 
new arteries*®. The timing and requirements of vein-to-artery conver- 
sions are not known, but could inform artery regeneration. 

In mice, a portion of the coronary arteries of the heart develop from 
a vein called the sinus venosus (SV; Fig. 1a). During embryogenesis, 
endothelial cell-lined angiogenic sprouts migrate from the SV to fill the 
heart with an immature coronary vessel plexus*. This plexus unites with 
plexus vessels from the endocardium*”®, and, together, they remodel 
into arteries, capillaries and veins. The plexus lacks blood flow until it 
attaches to the aorta, and arterial morphogenesis requires this event, 
suggesting that blood flow initiates artery development*!!. However, it 
has been difficult to delineate cell fate changes during coronary angio- 
genesis owing to the limited number of molecular markers and bulk 
transcriptional analyses of heterogeneous populations. 

Single-cell RNA sequencing (scRNA-seq) can overcome this lim- 
itation by producing single-cell-resolution maps of developmental 
transitions. Here, we developed a statistical test that categorizes sub- 
populations within scRNA-seq data sets as continuous or discrete to 
identify candidate developmental transitions. Computational or in 
vivo analysis of the SV-to-coronary transition revealed that SV cells 
of the mouse heart undergo a gradual conversion from vein to artery 
before a subset crosses a threshold to differentiate into pre-artery 
cells. Pre-artery cells differentiated before blood flow from the SV 
and endocardium and produced a large portion of coronary arteries. 
COUP-TF2 blocked progression to the pre-artery state through 


activation of cell cycle genes, which ultimately inhibited artery devel- 
opment. Understanding this and other cell fate switches and inhibitory 
signals will advance our knowledge of tissue development and could 
improve regenerative medicine. 


Finding developmental transitions in scRNA-seq data 
We performed a two-step analysis that identified and clustered cell 
subtypes by iterative robust principal component analysis (rPCA), and 
then subjected clusters to a pairwise discreteness test (Fig. 1b). First, 
cell subtype clusters were manually defined on the basis of unique 
gene expression patterns and cell separation in multiple iterations of 
rPCA” (Fig. 1b). rPCA was better than classical PCA at separating 
small subpopulations of cells'? (Extended Data Fig. 1a). We also 
replaced default principal component scores with a sum of the top 60 
genes score because it was less correlated with technical artefact and 
better correlated with cluster-specific genes (Extended Data Fig. 1b, c). 
Cell cycle heterogeneity was also removed (Extended Data Fig. 1d), and 
plots were inspected to confirm the absence of doublets (Extended Data 
Fig. le). This process resulted in cell clusters that correlate well with 
genes that define cell identity, and not with cell cycle heterogeneity or 
technical artefact (Extended Data Fig. Ic, d). 

Second, we developed the pairwise discreteness test to determine 
whether clusters were discrete or continuous (that is, connected by 
intermediate or transitioning cells). This statistical test projects pairs 
of subpopulations onto a linear axis of cell identity, measures the size 
of the gap between the populations, and estimates the number of inter- 
mediate cells (Fig. 1b and Extended Data Fig. 1f). It also determines the 
strength of continuity (Extended Data Fig. 1h), and could be confirmed 
using simulated data (Extended Data Fig. 1h). Combining the results 
created a relationship graph (Fig. 1b), which could identify candidate 
developmental transitions. Then, cell fate changes could be analysed 
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Fig. 1 | Identifying pre-artery cells using scRNA-seq. a, b, Schematics of 
coronary artery development (a) and computational pipeline 

(fin: estimated fraction of cells that are intermediate; x, width of the 
largest gap in scores between populations) (b). c, Relationship graph for 


in high resolution by observing gene expression changes across con- 
tinuous populations (Fig. 1b). 

We used this pipeline to analyse 843 ApjCreER lineage-labelled (Cre 
expressed in SV) cardiac endothelial cells from hearts removed from 
mouse embryos at embryonic day 12.5 (E12.5) (Extended Data Fig. 1g). 
Our data set contained endothelial cells from the SV, SV-derived coro- 
nary vessels, venous valves, valve mesenchyme, and some ventricular 
endocardial cells (Extended Data Fig. 1i, j). Clustering and the pair- 
wise discreteness test revealed a continuum between coronary vessel 
subtypes, the SV, venous valves, ventricular endocardium, and mesen- 
chyme (Pdgfra‘, Pecam1'°”’-) (Fig. 1c and Extended Data Fig. 1i-k). 
These associations are consistent with anatomical relationships (SV is 
adjacent to venous valves and endocardium) and previous lineage trac- 
ing experiments (SV transitions into coronary vessels and endocardium 
transitions into mesenchyme)*!!!4-!”, Thus, our pipeline can identify 
subpopulations and recapitulate known developmental transitions and 
anatomical relationships. 


Pre-artery cells differentiate before blood flow 

We analysed the developmental transition linking SV coronary pro- 
genitors (SVc) and coronary vessels (Fig. 1c, dotted line). Only the SVc 
was included because clustering indicated that the SV had two domains 
(Fig. 1c), and this was confirmed using immunofluorescence and in situ 
hybridization (Extended Data Fig. 2a-f). The SVc was anatomically 
and transcriptionally continuous with coronary vessels, whereas the 
SVv (SV valve proximal) was continuous with venous valves (Fig. 1c, 
Extended Data Fig. 2d, f). Therefore, rPCA of the SVc and coronary 
vessels was performed to study the SVc-coronary vessel continuum 
(Fig. 1d). 

Unexpectedly, the SVc-coronary vessel continuum identified cells 
that were transcriptionally distinct and expressed genetic markers of 
mature arterial cells (Fig. 1d). We previously reported‘ that plexus 
cells express arterial genes such as Dil4 and Efnb2, but these are also 
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ApjCreER-traced endothelial subtypes. d, Pre-artery cells extend from the 
plexus in the SVc-coronary vessel continuum. Gene expression in brown. 
n=A415 cells. e, Heat map of venous and arterial genes. At, atria; endo, 
endocardium; ven, ventricle. 


expressed in angiogenic vessels, and are not artery-specific!®'?. The 
scRNA-seq analysis revealed that, within the Dil4+ domain, some 
cells had initiated a distinctive transcriptional program, shifting 
away in the rPCA plot (Fig. 1d). Cells within this subset specifically 
expressed mature artery-specific genes, including Cx40 (also known 
as Gja5) (Fig. 1d). Analysis of multiple arterial and venous genes in 
single cells or as averages within clusters (defined in Extended Data 
Fig. 2g) revealed that many arterial genes were either specific to or 
significantly increased in the Cx40* cluster (Fig. 1d and Extended Data 
Fig. 3a, b). Multiple venous genes were either completely depleted or 
significantly downregulated (Fig. 1d and Extended Data Fig. 3c, d). 
Comparison of expression between the SVc and arterial populations 
revealed that SV-derived cells showed an extensive switch towards arte- 
rial fate (Fig. le). 

We next compared E12.5 arterial cells with adult coronary vessel 
cells. Each embryonic cell was matched to the adult cell to which it 
was most similar within the artery-capillary—vein continuum formed 
by adult coronary vessels”° (Extended Data Fig. 3e-g). E12.5 artery 
cells were most similar to adult arterial cells, whereas coronary vessel 
plexus cells were most similar to adult capillaries and veins (Extended 
Data Fig. 3h). We also found that E12.5 and adult artery populations 
were enriched for nearly the same artery markers (Supplementary 
Table 1). The exception was Notch1 (enriched only in adults), possibly 
because blood flow upregulates Notch 1?', and E12.5 is before the onset 
of coronary perfusion. Thus, a subpopulation of plexus cells undergo 
a transcriptional shift to resemble mature arteries before the presence 
of arterial vessels or blood flow, prompting us to term them pre-artery 
cells. 

The scRNA-seq also identified new arterial genes (Extended Data 
Fig. 4a). Slc45a4 marked pre-artery cells at early stages and was later 
specific to mature embryonic arteries (Extended Data Fig. 4b, c). It was 
also enriched in adult coronary artery cells (Extended Data Fig. 4a). 
We found other genes to be enriched in pre-artery cells (Extended 
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Fig. 2 | Pre-artery cells build coronary arteries. a, CX40 
immunofluorescence in hearts to mark pre-artery cells (arrowheads). 

b, Schematic of pre-artery cells during coronary development. c, CX40T 
cells in SV- and endocardium-derived plexus. d, Cx40CreER lineage 
labelling (E11.5 induction). n =7 hearts. e, f, Pre-artery lineage labelling 
in arteries (arrowheads) and a subset of capillaries (arrows) at E15.5 (e) 


Data Fig. 4d). Of these, Mecom and Igfbp3 marked arteries in adults 
(Extended Data Fig. 4d). 


Location, origins, and fate of pre-artery cells 

In late embryonic stages (E17.5), CX40 is specific to mature arteries 
(Extended Data Fig. 5a). By contrast, whole-mount immunostaining 
at early stages revealed that a small population of CX40° cells first 
appeared at E12.5. These cells were interspersed within the intramy- 
ocardial plexus and expanded by E13.5 (Fig. 2a, b). Localization of 
additional pre-artery genes confirmed this result (Extended Data 
Fig. 5b). The absence of CX40* cells in the SV and their presence in 
the coronary vessel plexus agreed with clustering and pairwise analysis 
showing that pre-artery cells were continuous only with the coronary 
vessel plexus (Fig. 1c). Defining clusters using Seurat showed similar 
results (Extended Data Fig. 6a), although clusters were not as precise 
and were associated with cell cycle genes (Extended Data Fig. 6b, c). 
Thus, coronary angiogenesis involves the specification of single arterial 
endothelial cells within the intramyocardial plexus (Fig. 1b). 

Although our scRNA-seq investigated only SV-derived vessels, lin- 
eage tracing revealed that coronary arteries are derived from both the 
SV and the endocardium” (Extended Data Fig. 6d). Single CX40T 
cells were detected in the plexus from both sources (Fig. 2c), indicating 
that pre-artery specification occurs during both SV and endocardium 
angiogenesis. 

Cx40CreER Rosa‘! embryos were used for lineage tracing of 
pre-artery cells (tamoxifen, E11.5; Extended Data Fig. 6¢, f). CxX40* 
pre-artery cells were later found in arteries, but not veins (Fig. 2d, e). 
A few capillaries were lineage-traced, indicating that pre-artery cells 
could revert to a capillary fate (Fig. 2d). Dosing at E10.5 ensured that 
our result was not due to persistent tamoxifen (Extended Data Fig. 6g), 
and clonal level labelling confirmed the lineage data (Extended Data 
Fig. 6h). Notably, at postnatal day (P)8, the right and left coronary 
artery branches were heavily lineage-labelled in hearts from mice dosed 
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at E11.5; only the most distal tips were unlabelled (Fig. 2fand Extended 
Data Fig. 6i). Thus, pre-artery cells build a large portion of mature 
coronary arteries. 

Pre-artery cells first appeared before blood flow, but they were 
abundant in the plexus through E14.5 (Extended Data Fig. 7c), sug- 
gesting that specification could continue after coronary perfusion. 
To investigate this possibility, we used Cre lines that specifically label 
either coronary vessel plexus (ApjCreER) or pre-artery (Cx40CreER) 
cells (Extended Data Fig. 7a) and induced labelling at various times 
(Extended Data Fig. 7b). Labelling of the coronary vessel plexus at E12.5 
or E13.5 lineage-traced a small number of pre-artery cells (Extended 
Data Fig. 7d, e). However, when the coronary vessel plexus was labelled 
at E14.5, there was no tracing into artery main branches and very little 
in the tips (Extended Data Fig. 7f, h). Conversely, labelling at E14.5 with 
Cx40CreER lineage-traced most left and right coronary artery branches 
(Extended Data Fig. 7g, h). Finally, inducing labelling with Cx40CreER 
at E16.5 resulted in few capillary cells being labelled at embryonic and 
postnatal stages (Extended Data Figs. 6j, 7i). These data indicate that 
pre-artery specification occurs in the coronary plexus between E12.5 
and E14.5, creating a progenitor pool that forms virtually all of the 
embryonic left and right coronary artery branches. 

We next investigated whether the artery tips that did not form from 
pre-artery cells (Fig. 2f) arose from pre-existing arteries or through 
capillary differentiation. Induction of ApjCreER and Cx40CreER 
labelling at P2 revealed that artery tips at P6 were composed of 
ApjCreER-lineage cells but depleted of Cx40CreER-labelled cells 
(Extended Data Fig. 6k). Thus, postnatal artery tips grow by capil- 
lary arterialization. 

The morphogenic changes that accompany coronary artery remodel- 
ling are seen after blood flow has been established, and are thought to be 
triggered by shear stress”. In E13.5 Isl] mutant mice that have delayed 
blood flow’, pre-artery cells had congregated in the region where the 
coronary artery would eventually form (Fig. 2g) and began to increase 
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lumen size (Fig. 2h). Therefore, pre-artery cells within the plexus can 
differentiate and initiate remodelling before cues from blood flow. 


Gradual cell fate conversion 

To investigate the vein-to-artery conversion, single cells along the 
SVc-coronary vessel plexus-pre-artery developmental transition were 
projected onto a linear continuum (Fig. 3a, b). Gene expression was then 
visualized by LOESS regression (Fig. 3b-e, g and Extended Data Fig. 8a). 
There was a progressive decrease in venous identity as cells exited the 
SV and moved towards pre-artery (see Coup-tf2, EphB4 and Tie-2 (also 
known as Tek); Fig. 3b). A sharp decrease in venous genes was seen 
in cells that had undergone full pre-artery specification (see Coup-tf2 


lines, SVc expression levels; red shading, pre-artery cells. f, Model based 
on known marker gene patterns. g, Cell cycle genes decreased in pre-artery 
cells. Art, pre-artery; CV, coronary vessel. 


and Apj (also known as Aplnr); Fig. 3b). Arterial gene expression 
showed two patterns: ‘early’ genes, expression of which progressively 
increased in coronary vessel plexus and pre-artery cells (Fig. 3c and 
Extended Data Fig. 8a), and ‘late’ genes, expression of which was low in 
coronary vessel plexus, but increased sharply in pre-artery cells (Fig. 3d 
and Extended Data Fig. 8a). Notch ligands and receptors were early 
genes, with the exception of Hey1, which increased sharply in pre- 
artery cells (Fig. 3e and Extended Data Fig. 8a). These findings suggest 
that the loss of venous identity is initially gradual with a progressive 
increase in arterial identity, and that pre-artery specification occurs 
after a threshold of venous loss and arterial gain has been achieved 
(Fig. 3f). 
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a, b, E15.5 hearts induced to express Coup-tf2°" or Gfp before pre-artery 
specification. c, d, E15.5 hearts induced to express Coup-tf2™ after 
pre-artery specification. b, Coup-tf2%, n= 11 hearts; Gfp, n= 11 hearts. 
d, n=6 hearts. e, f, Coup-tf2° induction in all endothelial cells before 
pre-artery specification. Control, n = 12 hearts; Coup-tf2", n= 20 hearts. 
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as mean + s.d. P value: unpaired two-tailed t-test. NS, P> 0.05; *P < 0.05; 
#* P< 0.01; **** P< 0.0001. 
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Fig. 5 | COUP-TE2 inhibits artery specification through cell cycle genes. 
a, rPCA plots from E14.5 hearts (wild-type, n = 347 cells; Coup-tf2", 

n= 321 cells). Red brackets, artery cells devoid of Coup-tf2 and Apj. 

b, Coronary continuum based on gene expression patterns in a. n = 347 
cells. c, Coup-tf2 cells do not populate the Coup-tf2~Apj- artery 
population. Wild-type, n = 347 cells; Coup-tf2, n= 321 cells. 

d, Progression towards artery is not generally affected by Coup-tf2°". 
Wild-type, grey lines; Coup-tf2°", yellow lines. Red-shaded region, 
pre-artery cells. e, Heat map showing the distribution of coronary vessel 


To understand the pre-artery threshold, we performed pathway analysis 
using gene set enrichment analysis (GSEA)**. Most pathways that were 
enriched in plexus over arterial cells were associated with cell cycling 
(Extended Data Fig. 8b). Arterial cells are thought to leave the cell cycle 
in response to blood flow”*-”’; however, pre-artery cells collected before 
blood flow displayed a decrease in cell cycle genes (Fig. 3g, Extended 
Data Fig. 8c, and Supplementary Table 2). In vivo, pre-artery cells were 
less proliferative than the surrounding plexus (Extended Data Fig. 8d). 
Thus, decreased proliferation in arteries is acquired during pre-artery 
specification, and not specifically in response to blood flow. 


COUP-TF2 blocks artery formation 

To investigate whether pre-artery specification was necessary for 
artery formation, we required a tool to block this process. We tested 
COUP-TE2 because it induces venous fate and antagonizes arterial 
fate’*8 and was sharply decreased in pre-artery cells (Figs. 1d, 3b). 
ApjCreER mice were crossed to mice that constitutively express 
Coup-tf2 after Cre recombination”? (Extended Data Fig. 9a) and preg- 
nant dams were treated with tamoxifen to induce overexpression of 
Coup-tf2 (Coup-tf2°") before pre-artery specification. Cre recombina- 
tion of the Coup-tf2 allele was low, making this experiment a mosaic 
analysis in which Coup-tf°¥ cells were followed within wild-type tissue 
(Extended Data Fig. 9b, c). 

Coup-tf2™ cells were present in capillaries and veins, but not arteries 
(Fig. 4a, b, top and Extended Data Fig. 9d). By contrast, control GFP* 
cells were found in arteries, capillaries, and veins (Fig. 4a, b, bottom). 
Coup-tf2°* cells could survive in arteries when VE-cadherin-CreER 
induced recombination after arteries had formed (Extended Data 
Fig. 9e). Coup-tf2°" cells could also migrate normally onto the heart 
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plexus cells in the indicated cell cycle phases. f, Fold increase in control 
GFP or COUP-TE2° cells between E11.5 and E14.5. Control, n =8 hearts; 
Coup-tf2°", n=5 hearts. g, EdU incorporation in coronary endothelial 
cells from Cdh5CreER Coup-tf2!"* hearts. E12.5: control, n = 4 hearts; 
Coup-tf2!"*, n=2 hearts. E14.5: control, n =3 hearts; Coup-tf2!"*, n=3 
hearts. h, Cell cycle inhibition reverses the ability of Coup-tf2” to block 
artery formation (compare to Fig. 4f). Control, n =6 hearts; Coup-tf2, 
n=6 hearts. P=0.4167. Data shown as mean + s.d. Unpaired two-tailed 

t test. *P< 0.05; **P<0.01; ***P< 0.001. 


(Extended Data Fig. 9f-h), although they caused a mild increase in ves- 
sel density at E13.5 (Extended Data Fig. 9g). Thus, forced COUP-TF2 
expression before pre-artery specification blocks cells from contribut- 
ing to coronary arteries, suggesting a failure to acquire pre-artery fate. 

Induction of Coup-tf2™ after pre-artery specification with Cx40CreER 
(tamoxifen at E11.5 or E12.5) resulted in numerous Coup-tf2 
cells within the artery (Fig. 4c, d, and Extended Data Fig. 9i) that 
expressed the arterial markers CXKCR4 and JAGI (Fig. 4c and Extended 
Data Fig. 9i). Therefore, Coup-tf2" inhibits arterial fate only before 
pre-artery specification. Pre-artery specification was then blocked 
throughout the entire coronary plexus by inducing widespread Coup- 
tf2°" recombination using Cdh5CreER (tamoxifen at E11.5 and E13.5). 
This resulted in small or completely absent coronary arteries (Fig. 4e, f). 
By contrast, induction of Cdh5CreER-Coup-tf2°¥ after pre-artery 
specification, but before arterial morphogenesis (tamoxifen at E13.5 
and E15.5), resulted in relatively normal artery development, confirm- 
ing that the later steps in artery formation are not greatly inhibited 
by COUP-TEF2 (Fig. 4g). Thus, pre-artery specification is required for 
artery development, and this is the specific differentiation step that is 
antagonized by COUP-TF2 (Fig. 4h). 


COUP-TF?2 inhibits pre-artery via cell cycle genes 

We next used scRNA-seq to compare control and Coup-tf2™ cells. 
E14.5 coronary endothelial cells (Extended Data Fig. 10a) were 
analysed as described for E12.5. Coup-tf2" cells were identified by the 
expression of the transgene’s FLAG-myc tag (Extended Data Fig. 10b, c). 
rPCA revealed a transcriptional continuum linking venous, coronary 
vessel plexus, and arterial cells (Fig. 5a, b). Vein cells in this data set 
expressed Coup-tf2 and Apj and lacked Dil4 and Notch4, as has been 
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described for coronary veins*°. Superimposing transgenic cells onto 
the control continuum showed that Coup-tf2" cells were excluded only 
from the arterial population (Fig. 5c). Venous and arterial genes along 
the continuum were not generally inhibited by Coup-tf2 (Fig. 5a, d 
and Extended Data Fig. 10d). The defect instead was in the number 
of fully pre-artery or arterial cells, as shown with genes such as Cxcr4 
and Cx40 (Fig. 5a). 

Analysis of differential gene expression did not reveal marked 
changes in the expression of Notch genes, despite the prevailing theory 
that COUP-TE2 functions by antagonizing this pathway (Fig. 5a, d, 
Extended Data Fig. 10d and Supplementary Table 3). Furthermore, 
overexpression of Notch signalling did not rescue the Coup-tf2°# 
phenotype (Extended Data Fig. 9}, k). It is possible that expression levels 
were not high enough to overcome COUP-TF2. Instead, a prominent 
feature of Coup-tf20¥ cells was an increase in cell cycle gene expression 
(Supplementary Table 3). Plotting coronary vessel plexus and vein cells 
according to G1/S/G2/M cell cycle staging revealed that the Coup-tf2°# 
population contained more cells with a cycling profile when compared 
to controls (Fig. 5e). 

COUP-TF2 also influenced coronary vessel proliferation. The relative 
increase in Coup-tf2°" cells over developmental time was greater than for 
controls (Fig. 5f). Endothelial deletion of one copy of Coup-tf2 resulted 
in decreased proliferation and expansion of coronary vessels (Fig. 5g and 
Extended Data Fig. 10e). As pre-artery specification was associated with 
decreased proliferation, these data suggest that COUP-TF2 may block 
arterial specification by activating cell cycle genes. 

Next, we sought evidence that cell cycle exit enhances arterial spec- 
ification, and that COUP-TF2 antagonizes this activity. First, cultured 
SV sprouts were treated with a cyclin-dependent kinase (CDK) inhibi- 
tor, which significantly increased artery differentiation (Extended Data 
Fig. 91 and m). Second, a CDK inhibitor was administered to Cdh5CreER 
Coup-tf2°# mice dosed with tamoxifen early to assess whether the 
phenotype of small and absent coronary arteries could be alleviated 
(see phenotype in Fig. 4f). Inhibition of CDKs resulted in no significant 
difference between control and transgenic animals (Fig. 5h), demon- 
strating that the ability of COUP-TF2 to inhibit artery formation had 
been reversed. 


Discussion 

scRNA-seq can reveal developmental transitions at a much higher 
resolution than was previously possible*!-*?. By combining sCRNA-seq 
with in vivo localization and genetic manipulations, we show that a 
subset of endothelial cells within the immature coronary plexus crosses 
a transcriptional threshold to become pre-artery cells. Pre-artery spec- 
ification is a critical step because blocking this process inhibited artery 
formation. Prior to pre-artery specification, SV-derived endothelial 
cells gradually decreased expression of venous genes while gradually 
increasing expression of arterial genes. These data suggest that fate 
switching during angiogenesis occurs in a progressive manner, and 
that individual plexus cells that reach a threshold towards full arterial 
differentiation form the mature coronary arteries. 

Although COUP-TF2 is considered a master regulator of veins, 
precisely how it brings about venous fate and suppresses artery fate is 
still under investigation”. Single-cell analysis revealed that COUP-TF2 
did not push cells towards a venous fate or markedly suppress arterial 
genes. Instead, COUP-TF2 specifically blocked pre-artery specification, 
because Coup-tf2°" induction before the pre-artery stage prevented 
mature artery development, whereas induction afterwards had little 
effect. Our data indicate that COUP-TE2 suppresses pre-artery speci- 
fication by activating cell cycle genes. Recently, retinal artery differen- 
tiation has been shown to depend on cell cycle arrest triggered by blood 
flow, Notch activation, and CX37 (also known as GJA4)””. Pre-artery 
specification was independent of flow, but may engage similar mech- 
anisms. Future experiments should investigate whether this higher- 
resolution understanding of coronary artery differentiation during cardiac 
angiogenesis could aid the development of regenerative therapies. 
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METHODS 


Mice. All mice were used in compliance with Stanford University IACUC regulations. 
The following mouse strains were used: wild type (CD1, Charles River Laboratories, 
Strain Code #022), ApjCreER®, Rosa©"? 029, Rosa’™G Cre reporter (The Jackson 
Laboratory, Gt(ROSA)26Sor'"4ACTS-tdTomato,-EGFP)Luo/y, Stock #007576), RosaN!©P 
(The Jackson Laboratory, Gt(ROSA)26Sor°™! Noh) Pam iy Stock #008159), Rosa’? mato 
Cre reporter (The Jackson Laboratory, B6.Cg-Gt(ROSA)26Sor!"(CAG-td Tomato) Hizey, 
Stock #007909), Isl1M°ror"eMe"34, Cdh5CreER®, Cx40Creer*®, Nfatc1“””, Rosa©onfetti 
(The Jackson Laboratory, Gt(ROSA)26Sor!™ (CAG Brainbow2.)Cle/y Stock #013731), 
Coup-tf2 flox (Mutant Mouse Regional Resource Center, B6;129S7Coup-tf2'™?*/ 
Mmmh, Stock #032805MU). Apln-lacZ?”, CXCR7-GFP (The Jackson Laboratory, 
C57BL/6-Ackr3""!“""/J, Stock #008591), CXCL12-DsRed (The Jackson Laboratory, 
Cxcl12'™?!5™/), Stock #022458), VE-Cadherin-CreER*®. All mice were maintained 
on a mixed background. 

Timed pregnancies were determined by defining the day on which a plug 
was found as E0.5. For Cre inductions, tamoxifen (Sigma-Aldrich, T5648) was 
dissolved in corn oil at a concentration of 20 mg/ml and was injected into the 
peritoneal cavities of pregnant dams. For cell cycle inhibition, 0.4 mg dinaciclib was 
dissolved in 2.6% DMSO (in PBS) and was injected into the peritoneal cavities of 
pregnant dams. Dosing and dissection schedules for individual experiments were: 
(1) E12.5 single-cell RNA sequencing: tamoxifen on E9.5 and E10.5, dissection 
on E12.5. (2) E14.5 single-cell RNA sequencing: tamoxifen at E11.5 and E12.5, 
dissection at E14.5. (3) ApjCreER Coup-t{2°¥ experiments: tamoxifen at E9.5 and 
E10.5, dissected at E14.5 or E15.5 for coronary contribution quantification. Same 
dosing schedule, but dissected at E11.5 and E14.5 for recombination rate experi- 
ment (E11.5 only) and expansion experiment. Same dosing schedule, but dissected 
at E11.5, E12.5, or E13.5, was used for ventricular coverage visualization; tamoxifen 
at E11.5 and E12.5, dissected at E15.5 for capillary visualization in Extended Data 
Fig. 9. (4) Cx40Creer Coup-tf2°" experiments: tamoxifen at E11.5 and E12.5 or 
E13.5, dissected at E15.5; for Extended Data Fig. 9i: tamoxifen at E11.5, dissected 
at E15.5. (5) CdhSCreER Coup-tf2 before pre-artery: tamoxifen at E11.5 and 
E13.5, dissected at E15.5. (6) Cdh5CreER Coup-tf2°" after pre-artery: tamox- 
ifen at E13.5 and E15.5, dissected at E16.5. (7) Cdh5CreER Coup-tf2° dinaciclib 
experiment: tamoxifen at E11.5 and E13.5, dinaciclib at E12.5, dissected at E15.5. 
(8) Cx40Creer Rosa": tamoxifen at E12.5, dissected at E15.5. (9) ApjCreER 
Coup-tf2°" NICD experiment: tamoxifen at E11.5 and E12.5, dissected at E15.5. 
(10) Cx40Creer Rosa'4?omato lineage tracing: tamoxifen at E11.5, dissected at E12.5, 
P7 or P8; tamoxifen at E10.5, dissected at E15.5; tamoxifen at E16.5, dissected at 
P8. (11) Cdh5CreER Coup-tf2 flox dosage: tamoxifen at E10.5, dissected at E12.5; 
tamoxifen at E11.5, dissected at E13.5 or E14.5. (12) ApjCreER lineage tracing 
in right or left coronary artery: tamoxifen at E9.5 and E10.5, dissected at E14.5 
and E15.5. (13) Pre-artery cells/Slc45a4 in ApjCreER lineage vessels: tamoxifen 
at E9.5 and E10.5, dissected at E13.5. (14) Additional Cx40Creer and ApjCreER 
lineage-tracing experiments: see Extended Data Fig. 7. (15) VE-Cadherin-CreER 
Coup-tf2™: tamoxifen at E15.5 and E16.5, dissected at E17.5. 

For additional Cx40Creer Rosa" embryonic lineage-tracing experiment, 
pregnant dams were dosed via oral gavage with 1 mg 4-OH tamoxifen (Sigma- 
Aldrich H6278) at E11.5 and dissected at E12.5 (Extended Data Fig. 6f) or E15.5 
(Fig. 2). 

For postnatal lineage tracing at P2 and P6, tamoxifen was injected into the 
peritoneal cavity of the mother when the neonates were at P2 so that tamoxifen 
could be passed from the mother to the neonates through milk. 

No statistical methods were used to predetermine sample size. For in vitro 
experiments, cultures were randomly chosen for different treatments and exper- 
iments were performed multiple times. Randomization was not relevant to our 
mouse experiments because genotypes/groups were determined by mouse genetics. 
Blinding was used in sCRNA-seq and mouse experiments, except for lineage tracing, 
EdU experiments, Coup-TF2° cell quantification and NICD quantification, where 
blinding was not possible because cells positive for certain markers (MYC tag, GEP, 
tdTomato, EdU) revealed the identities of the samples. 

Cell isolation for sCRNA-seq. E12.5 scRNA-seq. SV-derived cells were captured by 
fluorescence-activated cell sorting (FACS) of ApjCreER lineage-labelled cells (Cre 
expressed in SV). An experiment was performed once in which male ApjCreER 
Rosa! mice were crossed to CD1 females, who were dosed with tamoxifen at 
E9.5 and E10.5. Embryos were removed and placed into cold, sterile PBS at E12.5. 
The SVs of each of 27 GFP-positive hearts were microdissected away from the 
ventricles and pooled into a 300-1] mix consisting of 500 U/ml collagenase IV 
(Worthington #LS004186), 1.2 U/ml dispase (Worthington #LS02100), 32 U/ml 
DNase I (Worthington #LS002007), and sterile DPBS with Mg*t and Ca*+. The 
ventricles of the 27 hearts were minced with forceps and pooled together in another 
300 11 of the aforementioned mix. The pooled SVs and ventricles were then incu- 
bated at 37°C, and gently resuspended every 7 min. After the incubation, 60 11 cold 
FBS followed by 1,200 11 cold sterile PBS were added and mixed into each tube. The 
samples were then filtered through a 70-|1m cell strainer; the filter and the source 
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tube were washed with a total of 1,200 il sterile PBS. Cells were then centrifuged 
at 400g at 4°C for 5 min. Each cell pellet was then gently resuspended in 600 11 3% 
FBS (in sterile PBS). Cells were centrifuged again at 400g at 4°C for 5 min. Each 
pellet was then gently resuspended in 2,000 il 3% FBS and 32 U/ml DNase I in 
sterile PBS. Cells were kept on ice until they were used for FACS. 

DAPI (1.1 1M) was added to the cells immediately before FACS. Single cells 
with a low DAPI signal, moderate PE-Texas Red signal and the highest Alexa- 
Fluor 488 signal were sorted using Aria II SORP (BD Biosciences). Each cell was 
sorted into a separate well of a 96-well plate containing 4 ,1l lysis buffer. Cells 
were spun down after sorting and stored at —80°C until cDNA synthesis. A total 
of 480 SV cells and 480 ventricular cells were sorted and processed for cDNA 
synthesis. Cells were analysed on the AATI 96-capillary fragment analyser, and 
a total of 915 cells that had sufficient cDNA concentration were barcoded and 
pooled for sequencing. 

E14.5 scRNA-seq. The experiment was performed once following the same pro- 
cedure as for E12.5 above unless otherwise noted here. 

One thousand, one hundred and fifty-two FACS-captured coronary cells 
lineage-labelled with ApjCreER were collected from E14.5 hearts (SV cells were 
excluded and the later time point used to ensure sufficient numbers of Coup-tf20# 
cells). To isolate Coup-tf2°" cells, male ApjCreER Coup-tf20" mice were crossed 
to Rosa’"™® females who were dosed with tamoxifen at E11.5 and E12.5 and the 
embryos removed at E14.5. A total of 16 GFP-positive embryos from four lit- 
ters were dissected for cell isolation and FACS. To isolate wild-type cells, male 
ApjCreER Rosa’"™° mice were crossed to CD1 females. Pregnant dams were dosed 
with tamoxifen at E11.5 and E12.5 and embryos removed at E14.5. A total of 12 
GFP-positive embryos from three litters were sorted out and further dissected. For 
both the wild-type and the Coup-tf2°" samples, a few GFP-negative embryos were 
processed for dissection and cell isolation in the exact same manner to serve as a 
negative control for the GFP signal during FACS. 

Cells with the highest Alexa-Fluor 488 signal, low DAPI signal, and low 

PE-Texas Red signal were sorted into lysis buffer. For Coup-tf2°, a total of 861 
cells were sorted and processed for cDNA synthesis. For wild-type, a total of 608 
cells were sorted and processed for cDNA synthesis. Of these, 1,152 passed cDNA 
fragment quality control (concentration >0.05 ng/jl) and were sequenced. Of 
those, 1,126 passed QC threshold (>1,000 genes, 10° mm10-aligned reads). In 
Coup-tf2°" embryos, 326 cells expressed the FLAG-Myc transgene and were com- 
pared to the 423 control cells that passed QC. 
cDNA synthesis and library preparation for sCRNA-seq. We used Smart-seq2 
to perform scRNA-seq>”. Poly-A mRNA in the cell lysate was converted to cDNA 
and amplified as described**. Amplified cDNA in each well was quantified using 
a high-throughput fragment analyser (Advanced Analytical). After quantification, 
cDNA from each well was normalized to the desired concentration range (0.05- 
0.16 ng/L) by dilution, consolidated into a 384-well plate, and subsequently used 
for library preparation (Nextera XT kit; lumina) using a semiautomated pipe- 
line as described**“", The distinct libraries resulting from each well were pooled, 
cleaned-up and size-selected using precisely 0.6 x to 0.7 volumes of Agencourt 
AMPure XP beads (Beckman Coulter), as recommended by the Nextera XT pro- 
tocol (Illumina). A high-sensitivity Bioanalyzer (Agilent) run was used to assess 
fragment distribution and concentrations of different fragments within the library 
pool. It is important to note that after pooling the libraries and before sequencing 
there is no PCR step in our protocol. Pooled libraries were sequenced on NextSeq 
500 (Illumina). 
Demultiplexing and alignment of scRNA-seq reads. The resulting reads were 1) 
demultiplexed using Illumina’s demultiplexing tool bcl2fastq (default settings), 
and 2) processed using skewer11 for 3’ quality-trimming, 3’ adaptor-trimming, 
and removal of degenerate reads, as described’. The processed reads were mapped 
to the mouse genome (mm10) using STAR (https://github.com/alexdobin/STAR) 
and gene expression was quantified with HTSeq (http://htseq.readthedocs.io/en/ 
release_0.9.1/). The expression of the Coup-TFII-OE transgene was quantified by 
aligning reads to the following sequence, encoding the FLAG-Myc tag: TAAGCT 
TCGTATATACCTTTCTATACGAAGTTGTGGATCTGCGATCTAAGTAAGC 
CGCGGCCATGGACTACAAGGATGACGATGACAAGGCCGCGGCAACTA 
GTAAGCTTGCCGCCATGGAGCAGAAACTCATCTCTGAAGAGGATCTGT. 
Cell subtype discovery with iRPCA. First, low-quality cells were filtered out 
by the following thresholds: >1,000 genes, <40% rRNA, > 10° mm10-aligned 
reads, from 915 sequenced cells. Eight hundred and fourty-three cells passed 
quality control. 

To identify the broad cell subtypes present, in situ hybridization data on 52 
genes from the Euroexpress*” database were compared to expression levels in 
an rPCA plot of all cell in the data set, excluding erythrocytes (Extended Data 
Fig. 1)). 

Cell subtypes in the ApjCreER-labelled populations were manually defined 
using gene expression patterns in manually selected PC plots derived from multi- 
ple iterative rounds of rPCA (iRPCA). There were two overall goals of iRPCA. The 
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first was to fully describe the cellular subtypes within an scRNA-seq data set while 
minimizing over-clustering of homogenous populations or continua, clustering 
based on cell cycle phase or technical artefacts/cell quality, and under-clustering of 
small subpopulations. The second goal was to preserve continuity or discreteness 
between subpopulations. 

Our pipeline differed from standard pipelines in several ways. First, we used 
rPCA (rrcov::PcaHubert) in lieu of standard PCA. Second, we replaced default PC 
scores by those calculated by the sum of top 60 genes: PC.score = PC.pos—PC.neg 
(Extended Data Fig. 1b, c). These two parameters were used because they provided 
more clearly defined separations among cells with unique gene expression patterns 
(see Extended Data Fig. 1a, b and additional description in main text). Finally, we 
made frequent use of PC pos/neg biplots, which we defined by: 


30g 
PC.pos = > —* 
i=1 MAX§, , 
30 
g 
PC.neg = > —=* 
j=1 MAX, 


Where gj) are the top 30 genes by positive loading to the PC and g; by 
negative loading. These were used to identify and exclude cell cycle-associated 
PCs (described below in Identifying cell cycle-regulated genes) (Extended 
Data Fig. 1d) and to inspect for cell doublets (expected to have nearly equal 
levels, on a log scale, of the top markers for two distinct subpopulations; we did 
not see any in our data set, possibly owing to strict FACS gating on FSC-W and 
SSC-W and the large spacing of wells on standard 96-well plates) (Extended 
Data Fig. le). 

Cell subtype clusters were assigned through the following process. After 
removing a small number of erythrocytes, all cells in the data set were used 
to calculate 15 PCs where the input was all genes minus those in our cell cycle 
category (see Identifying cell cycle-regulated genes) and the output was PC plots 
based on the sum of top 60 genes. Among the resulting 15 PC plots, one was man- 
ually chosen for further analysis based on the following criteria: 1. cells were well 
separated among the PC axes; 2. expression patterns of the top 60 genes revealed 
distinct populations or clusters; and 3. the PC was not highly correlated with 
cell cycle genes (see Identifying cell cycle-regulated genes) or number of genes 
detected (that is, technical artefact). Distinct cell populations within the selected 
PC were manually identified by their separation from other cells within the plots 
and strong correlation with distinct gene expression patterns. One (or more) 
distinct cell population was then removed, and another iteration was performed 
to calculate another set of PCs containing the decreased number of cells. Each of 
these subsequent iterations similarly involved, first, a PC calculation (10-15 PCs 
depending on step), then, a manual selection of one PC plot based on the 
above-described criteria, and, finally, within that selected PC the manual identi- 
fication or removal of cell subpopulations based on the above-described criteria. 
These iterations ended when the calculated PCs revealed a single continuum that 
was arranged in a linear progression on the PC plots, which indicated the pres- 
ence of only two groups of cells: one with high expression of one set of markers 
and the other with high expression of a second set of markers (Extended Data 
Fig. 1k). These last continua were separated into two groups, which comprised 
the final clusters. In this way, a single continuum was not overclustered into 
more than two groups. 

Included in the custom R scripts are the exact steps by which we obtained all 
the reported clusters in the E12.5 data. In the first two rounds, rPCA (rrcov::Pca- 
Hubert, k= 15) was run using all genes expressed in >1 cell, filtered by removing 
ribosomal proteins by grep(Rp[Is]*), as well as Rn45s (also known as Rna45s5), 
Lars2, and Malat1. In all rounds after that, the list of 202 cell cycle genes described 
below was also removed from the gene list. In total, 20 rounds of iRPCA were 
performed to cluster cells into the 10 subpopulations in this work. 

Pairwise discreteness test. To analyse the relationship between pairs of sub- 
populations of cells, the cells of the two subtypes are first projected onto a single 
axis of identity. For the purpose of the following description, these populations 
are referred to as A and B. To do this, cells are scored by their expression of the top 
differentially expressed genes between the two populations. Differential expression 
is calculated as log fold change, fractional difference (difference in fraction of A 
cells expressing minus the fraction of B cells expressing), and Wilcoxon P value; 
genes are filtered by fold change >0.2 (natural logarithm), fractional difference 
>0.05, and P< 107. The top genes, sorted by fold change and fractional differ- 
ence, are referred to as g, (top n genes enriched in A) and gy (top genes enriched 
in B). The results do not vary much for n between 20 and 100 (Extended Data 
Fig. 11, only low-confidence connections change). In this work, the gene list is 
pre-filtered by removing ribosomal genes (Rp[Is]*) and cell cycle genes (the list 


of 202 cell cycle genes described below). Cells are then given a score x by their 
expression of these genes: 


XA xB 


maxx, maxx, 


Where gis in logo counts per million (CPM) units and max is the maximum across 
all cells in the pair of subtypes. This scores cells along the axis of cell identity along 
Aand B. The resulting distribution of cells along this axis is tested for discreteness, 
or a lack of intermediate cells, by the width of the largest gap between the two 
distributions. The statistic is calculated by the following procedure (Extended Data 
Fig. 1f): 

1. The distribution is fitted to a Gaussian mixture model with two components, 
giving means ju, and ip. 

2. Cells within the range (j:, jug) are identified as candidate intermediates. 

3. The largest gap distance between candidate intermediate cells, dmax, is 
identified. 

4. The list of candidate intermediate cells is further restricted to the 10 cells on 
either side of dmax, and their gap distances, excluding dmax, are fit to an exponential 
with rate k, F(d;k). If there is a uniform distribution of intermediate cells along the 
continuum from A to B, the gap distances d; follow an exponential distribution 
P(d) + e*4, where the mean gap distance E[d] = 1/k (equivalent to the mean time 
between events for a Poisson process occurring at rate k). 

5. The discreteness statistic is calculated as D=logi9F(dmaxsk). 

6. Two populations are considered discrete if D < —6. In the PlotConnectogram 

function, distributions with —3 > D> -—6 are connected by a semitransparent lines 
to indicate lower confidence in their continuity. In simulated data, this corre- 
sponded to 3-5 intermediate cells. Distributions with med(D) > —3 are connected 
by 100%-opacity lines to indicate high confidence in their continuity. 
Estimating the number of intermediate cells. Second, the number of intermediate 
cells connecting the two pairs is estimated by maximum-likelihood fitting of a 
five-parameter distribution. This distribution was derived by considering two cell 
types with mean expression values jv, jig and a transitional population sampled 
evenly from the range of values jiq < js < up. The exact PDF that describes sam- 
pling from this distribution with Gaussian noise is: 


UB 
P(X3 [ys yo) =f, N (x5 0) + fN (*5 My) thay [ N@smo)du (1) 
HA 


where f, is the fraction of cells in cell type A, fg is the fraction of cells in cell type 
B, and faz is the fraction of cells along the A~B continuum. The integral in (1) is 
approximated by 


0, x< Hy — 2.70 


Hy —2.70 SX < py + 2.70 


Hp 
[NG a)dus 41, 


HA 


Hy + 2.70 SX < flyp—2.70 (2) 


+G, [y—2.70 Sx < ft, + 2.70 


0, x> Hb, + 2.70 


Where C, D, F, G are calculated to make (2) a continuous function. This PDF is 
then fit to the distribution using Nelder-Mead optimization (stats::optim) with five 
iterations for different initial values of fy. The initial values for ju, jug, and o are 
derived by fitting with a two-component Gaussian mixture model. f,z determines 
the width of the lines connecting populations in our PlotConnectogram function. 
Simulation of population distributions for model validation. We optimized the 
cutoffs for the discreteness test using simulated data. The data was simulated by 
drawing from the five-parameter distribution described above under Estimating 
the number of intermediate cells, where faz ranged from 0 to 1 (Extended Data 
Fig. 1h). Using the simulations, we found —6 to be a good cutoff for calling cell 
types discrete—this cutoff is low so as to be sufficiently sensitive to a small number 
of intermediates (~3 intermediate cells out of 150). 
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Identifying cell cycle-regulated genes. When mentioned in the main text, we 
filtered out a list of 202 cell cycle genes from the input to rPCA to reduce the con- 
tribution of cell cycle to heterogeneity. We defined this list by rPCA: cell cycle PCs 
were identified by high loadings of known cell cycle markers (for example, cyclins, 
Mki67, Top2a). Also, cell cycle has a unique pattern on PCi.pos versus PCi.neg 
biplots (described above): there is typically a large coordinated increase in genes 
upon entering cell cycle with little corresponding decrease in genes, and PCi.pos 
has low correlation to PCi.neg. The positive and negative loadings were therefore 
inspected separately for cell cycle genes. In this work, rPCA was performed on a 
highly cycling, relatively homogeneous subgroup of cells (later identified as SVc and 
CV) using all genes; we used the union of the top 60 genes by each of the following 
loadings: PC1-positive, PC2-negative, PC2-negative, PC4-positive, PC5-negative, 
and PC6-negative, which produced a list of 230 candidate cell cycle genes. We 
filtered this list for genes that had high loadings to other PCs, marked subpopu- 
lations of cells, and had no cell cycle annotation; these included arterial markers 
such as Unc5b. This produced the final list of 202 cell cycle genes. This list was not 
complete, but was sufficient to remove cell cycle heterogeneity from the top PCs. 
Defining the fetal SVc-CV plexus-arterial axis. We defined the SVc-CV plexus- 
arterial axis (x) using the scores generated by PC2 and PC3 from RPCA on SVc, 
CV, and arterial cells (Fig. 3a, Extended Data Fig. 4) as below: 


PC2.score, PC3.score>—PC2.score—0.4 
oe 


\[PC2.score? +PC3.score’, PC3.score < —PC2.score—0.4 


Figure 3a was coloured by the value of this axis. 

Cell cycle scoring. G1/S and G2/M signatures were discovered in an unbiased 
manner as follows: coronary vessel plexus cells from wild-type E14.5 animals were 
analysed with rPCA using all detected genes. Many of the top 60 genes by loading 
to PC3.neg and PC2.neg were known G1/S markers, and, thus, the G1/S score of 
a cell was defined by the sum of the scaled expression of these genes. Many of the 
genes with high loadings to PC4.neg and PC5.neg were known G2/M markers, 
and the G2/M score was calculated by the sum of the scaled expression of these 
genes. Cells were scored as cycling if they were not in the bottom-left modes (high 
expression of at least one cell cycle signature). 

Seurat clustering for comparison. To compare our clustering to Seurat, we ran 
Seurat with primarily default options. We filtered our list of 202 cell cycle genes 
as well as ribosomal proteins from the list of highly variable genes (y.cutoff=0.5) 
and ran PCA with 20 scores calculated. Based on the PC elbow plot, we selected 
the first 10 PCs to be used for clustering. We excluded PC6 for high loading of cell 
cycle genes (since our list of 202 genes was not exhaustive), and clustered using 
FindClusters with resolution 2. We also calculated t-SNE, and we used the t-SNE 
mediods of cell clusters to place the vertices for our results from pairwise PCA 
(Extended Data Fig. 11). 

Comparison to adult artery-vein continuum. We determined the similarity 
between our E12.5 endothelial cells and the mature artery—vein continuum as 
follows. We selected cells from the Tabula Muris data set with the Tissue label 
‘Heart’ and annotation label ‘1. We ran PCA on the most variable genes 
(y.cutoff= 0.35) with the Seurat package. PC2 and PC3 separated cells into three 


populations wore single continuum, and we projected cells onto a single axis 
PC2, PC2 <0 


a 
,{PC2? + PC3*, PC2>0 


Unc5b were negatively correlated, and known capillary/venous markers such as 
Apj and Nrp2 were positively correlated to the axis, so we considered it to be the 
artery—vein continuum (AVc). We then calculated the similarity of each fetal cell 
to each adult cell. To do this, we used as input the union of the top 300 genes 
correlated to the adult and fetal AVc, smoothed by LOESS regression over the AVc 
defined above. We calculated the Pearson correlation similarity using these 
features, and mapped each fetal cell to the adult cell to which it was most similar 
by this metric. 


. Known arterial genes such as Cx40, Cx37, and 


Immunohistochemistry and Imaging. For whole-mount embryonic hearts. All 
embryos were fixed in 4% PFA at 4°C with shaking and washed twice (10 min each 
wash) with PBS at room temperature with shaking before dissection for whole- 
mount immunostaining. 

Intact embryonic hearts were washed in PBT (PBS with 0.5% Triton-X 100) 
at room temperature for one hour before incubation with primary antibodies. 
Primary antibodies were dissolved in either 5% goat serum or 5% donkey serum in 
PBT. Hearts were incubated in the solution with primary antibodies with shaking 
overnight at 4°C. Hearts were then washed with PBT for six to nine hours with 
shaking at room temperature, and the wash was changed every hour. Hearts were 
then stained with secondary antibodies with the same conditions and procedure 
as for primary antibodies. After washing off the secondary antibodies, hearts were 
then left in enough PBT to cover them. Two drops of Vectashield (Vector Labs, 
H1000) were added and mixed with the PBT for each heart, and the hearts were 
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stored at —20°C for the long term. Imaging was done with Zeiss LSM-700 (10x 
or 20x objective lens) with Zen 2010 software (Zeiss). 

For whole-mount postnatal hearts. Hearts were fixed in 4% PFA for 1 h at 4°C with 
shaking and washed twice (15 min each wash) with PBS at 4°C with shaking before 
dissection for whole-mount immunostaining. In the primary antibodies (diluted 
in PBT), hearts were shaken at room temperature for 6 h and overnight at 4°C. To 
wash the primary antibodies, hearts were shaken in PBT at room temperature for 
10 h and overnight at 4°C. Hearts were washed in 50 ml PBT and the wash was 
changed every 2 h while shaking at room temperature. Hearts were then placed 
in secondary antibodies (diluted in PBT) at room temperature with shaking for 
6 hand overnight with shaking at 4°C. Hearts were then washed in 50 ml PBT 
for 8 h (wash changed every 2 h) and overnight at 4°C. The washing was repeated 
for six more days. Prior to imaging, Vectashield (Vector Labs, H1000) was added 
to hearts in clean tubes, and hearts were equilibrated at room temperature for 
40 min. Imaging was done with Zeiss LSM-700 (10x or 20x objective lens) with 
Zen 2010 software (Zeiss). 

Primary and secondary antibodies. The following primary antibodies were used 
at the indicated concentrations: MYC-Tag for COUP-TF2° (Cell Signaling 
Technology, Inc., 2278S, 1:300), VE-Cadherin (BD Pharmingen, 550548, 1:125), 
VEGER2 (R&D Systems, AF644, 1:125), CX40 (Alpha Diagnostic International, 
CX40A, 1:300), ERG (Abcam, ab92513, 1:500), CKCR4 (BD Pharmingen, 551852, 
1:125), GEP (Abcam, ab13970, 1:500), VWE (Abcam, ab6994, 1:500), CLDN11 
(Abcam, ab53041, 1:1,000), SOX17 (R&D Systems, AF1924, 1:500), anti-actin 
a-smooth muscle-FITC (Sigma, F3777, 1:200), VEGFR3 (R&D Systems, AF743, 
1:125), DACH1 (Proteintech, 10914-1-AP, 1:500), JAG1 (R&D Systems, AF599, 
1:125). 

All secondary antibodies were Alexa Fluor conjugates (488, 555, 633, 635, 594, 

647, Life Technologies, 1:125 or 1:250). DAPI (1 mg/ml) was used at 1:500. 
In situ hybridization. To identify the broad cell subtypes in the E12.5 single cell 
data set, expression levels in rPCA plots of 52 genes were compared to in situ 
hybridization data from the Euroexpress” and Allen Brain Atlas databases (stages 
ranged from E11.5 to E15.5). Expression patterns from E14.5 Euroexpress data are 
shown in Extended Data Fig. li. 

For Adm and Fbin2, in situ hybridization on paraffin sections were performed 
twice as described previously*’. Antisense Adm and Fbin2 probes were labelled with 
digoxigenin (DIG)-UTP using the Roche DIG RNA labelling System according 
to the manufacturer’s guidelines. 

For Slc45a4, whole hearts were fixed and in situ hybridization performed according 

to protocol from Additional File 2 of ref. “4. Probes were Cdh5 (Advanced Cell 
Diagnostics 312531-C2), Cx40 (Advanced Cell Diagnostics 518041), and Slc45a4 
(Advanced Cell Diagnostics 522131-C3). Reagents are RNAscope Protease III 
& IV Reagents (Advanced Cell Diagnostics 322340) and RNAscope Fluorescent 
Multiplex Detection Reagents (Advanced Cell Diagnostics 320851). About 12 
embryonic hearts were dissected in a sterile and RNase-free environment into a 
1.5-ml tube and fixed in 1 ml 4% PFA for 1 h at room temperature. Three fixed 
hearts were processed in the same tube with 100 jl of the probes master mix. The 
experiment was performed three times, once each for E13.5 (n=3), E14.5 (n=2), 
and E15.5 (n=3). 
SV-atria explant experiment. The experiment was performed three times. In total, 
71 embryos were dissected at E12.5. The SV and atria of each embryo were dis- 
sected on sterile PBS and gently dropped onto a cell culture insert (EMD Millipore 
PI8P01250) coated with Matrigel (BD Biosciences) inside a well of a 24-well plate. 
Two to five explants were cultured onto each insert. Immediately after the explants 
were dropped onto the insert, 200 pl EGM2-MV medium was added into the 
space between the insert and the well. The SVs were allowed to attach onto the 
Matrigel at 37°C for 2-6 h before another 200 jl EGM2-MV medium was added 
to the space between the insert and the well. The explants were cultured at 37 °C 
for approximately 72 h before either flavopiridol or DMSO was added: 900 1] of 
40 nM flavopiridol (dissolved in 0.1% DMSO in EGM2-MV) or 0.1% DMSO in 
EGM2-MV (drug vehicle control) was added to each insert. After addition of 
either flavopiridol or DMSO, explants were incubated at 37°C for approximately 
48 h before they were fixed and stained. 

Each cell culture insert was fixed in 1,000 xl 4% PFA for 2 h at 4°C without 
shaking. Then, each insert was washed with 1,000 11 PBS three times at room 
temperature. Five hundred microlitres of primary antibodies (diluted in 0.5% PBT) 
were added onto each insert and inserts were incubated at room temperature with 
shaking for 4-6 h. The inserts were subsequently washed with PBS at room tem- 
perature with shaking for 2 h. Five hundred microlitres of secondary antibodies 
(diluted in 0.5% PBT) were added onto each insert and inserts were incubated at 
4°C for about 16 h. The inserts were then washed with PBS three times at room 
temperature with shaking for 2 h. The membrane containing the SVs was then 
excised from the insert and mounted onto a drop of Vectashield on a slide and 
stored at —20°C. Imaging was done using a Zeiss LSM-700 (10 x or 20x objective 
lens) with Zen 2010 software (Zeiss). 
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Acquisition and processing of images. All images were acquired with Zen 2010 
software (Zeiss). Images were prepared using Photoshop CS6 (Adobe). Any 
changes to brightness and contrast were applied equally across the entire image. 
In vivo EdU Assay. To measure in vivo proliferation rate, 50 j1g/g body weight 
of EdU was injected into pregnant mice intraperitoneally 2-3 h before embryo 
collection. EdU-positive cells were detected using a Click-iT EdU kit (Invitrogen, 
C10338) according to the manufacturer’s instructions. In brief, Click-iT reaction 
cocktails were incubated for 30 min after the secondary antibody incubation of 
the immunostaining protocol. 

Quantification and statistical analysis of confocal images. See Supplementary 
Methods for details. 

Code availability. The custom R scripts used to analyse the sCRNA-seq data are 
publicly available on GitHub (https://github.com/gmstanle/coronary-progeni- 
tor-scRNAseq). 

Data availability. Raw scRNA-seq data are available at https://github.com/gmstanle/ 
coronary-progenitor-scRNAseq. Figures associated with the raw data are Figs. 1, 3, 4, 
and Extended Data Figs. 1-8. There is no restriction on data availability. Source Data 
for Figs. 2, 4, 5, and Extended Data Figs. 6-9 are provided with the paper. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Single cell analysis of ApjCreER lineage 
labelled cells. a, Comparison of rPCA and classical PCA at separation 

of subpopulations. PC scores were selected to best separate the Enppt 
Esam~ population. Cells are coloured by expression (log) CPM, scaled 
to maximum per gene). n = 352 cells. b, Comparison of default and 
sum-of-60 modified PC scores. PC2 is the default PC score from rPCA; 
PC2.score is the modified sum-of-top-60 scores (expression is logi9 CPM, 
scaled to maximum). Y-axis is the number of genes detected per cell (>1 
count). n = 426 cells. c, Comparison of default and sum-of-top-60 scores. 
Scores were chosen that best separated the Vwf* and Cxcr4 populations: 
n= 426 cells. d, Unique cell cycle signature on PC.pos/PC.neg biplots. 
PC1.pos (PC1.neg) is the sum of the top 30 genes by positive (negative) 
loading to PC1. Cells are coloured by expression. Lower panel is the same 
rPCA after removing the list of 202 cell cycle genes. Numbers in bold are 
the correlations between PC1.pos and PC1.neg. n = 674 cells. e, PC.pos/ 
PC.neg biplot showing theoretical location of doublets expressing high 
levels of both gene sets. f, Schematic of the pairwise discreteness test on a 


discrete (left) and continuous (right) pair of subpopulations. 

g, FACS plots used to isolate GFP-positive cells (red box) from ApjCreER 
Rosa" hearts at E12.5. h, Top, discreteness statistic generated by 
pairwise discreteness test as a function of number of intermediate cells 
(nint) for simulated distributions. Bottom, pairwise distributions of cell 
clusters in the data set and the fraction of intermediate cells estimated by 
pairwise discreteness analysis. i, rPCA plots and their accompanying gene 
expression patterns in the embryonic heart as reported by Euroexpress. In 
situ hybridization images show whole hearts (top); insets of specific areas 
are in lower panels with relative expression levels indicated. Expression 
levels in rPCA plots range from 0 (yellow) to 4 (brown) in logigCPM. Top, 
n= 843 cells. j, Summary of broadly defined cell populations as indicated 
by gene expression patterns. n = 843 cells. k, Example of manual clustering 
process. For i, n = 732 cells; ii, n =531 cells; iii, n = 415 cells; iv, n= 284 
cells; v, n = 261 cells. 1, Comparison of pairwise discreteness test results for 
different numbers of genes per cell type signature (1). 
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whole-mount confocal immunofluorescence (c) of selected genes. For g, Colour coding showing subpopulations that were used to calculate 
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Extended Data Fig. 3 | Characterization of pre-artery cells. a~d, rPCA 
plots of the E12.5 SVc-CV continuum. Each dot is an individual cell, and 
gene expression levels are indicated by the colour spectrum as shown in 
Fig. 1d, which reflects logi9 CPM. a, Arterial genes highly enriched in 
the arterial areas of the plot. b, Arterial genes significantly upregulated 
in, but not specific to, the arterial area of the plot. c, Venous genes highly 
depleted in the arterial areas of the plot. d, Venous genes downregulated, 
but not depleted, in the arterial area of the plot. For a-d, Bonferroni- 
adjusted P < 0.01; PCA plots, n = 415 cells. Centre and error bars are 
mean + s.e.m. of log CPM expression values. e, Genes expressed in adult 
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coronary artery cells. Data are from the Tubula Muris consortium. n = 445 
cells. f, Assignment of artery, capillary, and vein in adult coronary cells 
based on gene expression enrichment in e. n= 445 cells. g, Schematic for 
comparing E12.5 coronary cells to those along the adult artery—capillary— 
vein continuum. h, Results of experiment schematized in g. The centre 
line correspond to the median; the upper and lower hinges correspond to 
the first and third quartile, respectively; the whiskers extend to the largest 
value or to 1.5 x IQR (inter-quartile range, or distance between quartiles), 
whichever is smaller. Pre-artery cells: n = 20 cells. CV: n= 277 cells. 
P=6.2 x 107}. Statistical test is two-tailed. 
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Extended Data Fig. 4 | Novel artery markers identified in scRNA-seq 
data. a, E12.5, E14.5, and adult coronary cell rPCA plots with genes highly 
enriched or specific to the arterial area during development. Each dot is 
an individual cell, and gene expression levels are indicated by the colour 
spectrum as shown in Fig. 1d, which reflects logig9CPM. Genes in bold red 
are also enriched in adult artery cells. b, Fluorescence in situ hybridization 
(RNAscope) for Sic45a4, which is expressed (arrowheads) in vessels 
positive for the arterial marker Cx40, but not in Cx40-negative capillaries 
(arrows). ¢, Slc45a4 expression in pre-artery cells derived from the SV 


lineage (ApjCreER lineage-labelled; arrowheads). d, Genes enriched in, 
but not specific to, arterial cells at E12.5 and E14.5. Genes in bold red are 
arterial specific in both the developing and adult heart. In a and d: for PCA 
plots, n=415 cells (top, E12.5); n = 347 cells (middle, E14.5); n = 445 cells 
(bottom, adult). For bar graphs E12.5, Art, n = 20 cells; CV, n =277 cells; 
SV, n= 118 cells; E14.5, Art, n= 70 cells; CV, n = 454 cells; SV, n= 144 
cells. Centre and error bars are mean + s.e.m. of log CPM expression 
values. Dots represent individual cells. Scale bars, 100 jum. 
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Extended Data Fig. 5 | Additional whole mount immunofluorescence 
of marker genes. a, CX40 whole-mount immunohistochemistry in late 
gestation hearts (E17.5). CX40 is expressed only in cells lining large 
arteries and arterioles (overlapping blue and green signal). Low level, 
non-arterial signal is in myocardial cells. b, rPCA plots from E12.5 and 
E14.5 with accompanying whole-mount immunofluorescence in E13.5 


High mag: intramyocardial plexus 


Marker Endothelial cells 


Sy A 


hearts. VWF is enriched in the SV; while APLN-nlacZ signal and DACH1 
are present throughout the coronary plexus. CKCR4, ACKR3-GEP, and 
CXCL12-DsRed are enriched in the pre-artery and artery areas of rPCA 
plots and are interspersed within the intramyocardial coronary plexus. 
n=415 cells (left, E12.5); n= 347 cells (right, E14.5). Scale bars, 100 jum. 
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Extended Data Fig. 6 | Clustering and additional lineage analysis of 
pre-artery cells. a, Clusters and relationships generated by rPCA and 

the pairwise discreteness test (left) and clusters generated by the Seurat 
pipeline (Louvain/SNN clustering, resolution = 2) (right). n = 757 cells. 
b, Violin plots show that arterial gene enrichment and venous gene de- 
enrichment are better with manual, iterative clustering, suggesting that 
this method leads to more precise populations. c, Violin plots of cell 
cycle genes in the two CV plexus clusters generated by the indicated 
algorithms. Seurat clusters are more defined by cell cycle differences than 
iterative rPCA (iRPCA) clusters. b, c, Violin plots were made using Seurat 
ViInPlot. Each violin plot is one subtype and each dot corresponds to a 
cell. d, Quantification of SV and endocardium contributions to coronary 
arteries. Error bars show s.d. ApjCreER RCA: n= 11 hearts. ApjCreER 
LCA: n=6 hearts. NfatclCre RCA: n=5 hearts. Nfatcl1Cre LCA: n=5 
hearts. Centre, mean. e, Experimental design to lineage trace pre-artery 
cells. f, Lineage labelling in E12.5 Cx40CreER Rosa'@°™*"? hearts induced 
with tamoxifen at E11.5. g, Arterial lineage labelling in hearts induced 

at E10.5. h, Example of clones in Cx40CreER Rosa" heart at E15.5. 
Tamoxifen was administered at E12.5. Two groups of cells sharing the 
same fluorescent label (clones) are present: YFP-labelled (yellow circle) 


and nGFP labelled (green circle). Clone sizes are very small, consistent 
with low proliferation rates in pre-arterial cells. i, P8 heart lineage from 
Cx40CreER Rosa'#™"° mice dosed with tamoxifen at E11.5. Heavy 
lineage labelling of the left coronary artery is shown (LCA). Arrowheads 
indicate branches of the right coronary artery. Myocardium (myo) of the 
left ventricle is also Cx40* at E11.5, and is also lineage labelled. j, Images 
from P8 Cx40CreER Rosa'™"° hearts dosed with tamoxifen at E11.5 

or E16.5. Only the E11.5 dosage results in capillary labelling (arrows) 
resulting from reversion of pre-artery cells that differentiate during the 
burst of pre-artery specification between E12.5 and E14.5. Arrowheads 
point to arterial lineage labelling. k, Postnatal lineage tracing in ApjCreER 
Rosa'#'o™° or Cx40CreER Rosa!" hearts where tamoxifen was injected 
at P2. Tips of arteries are lineage labelled with ApjCreER Rosa‘, but 
are depleted of Cx40CreER Rosa” label, indicating that artery tips 
can extend by incorporating capillary cells that differentiate into arterial 
endothelial cells. Unpaired two-tailed t-test was used to calculate P values. 
For ApjCreER, n= 78 artery tips at P2, n=41 artery tips at P6. P=4.4608 
x 107'°. For Cx40CreER, n= 81 artery tips at P2, n= 49 artery tips at P6. 
P=1.61705 x 10~"*. Error bars show s.d. *****P < 0.0001. Centre is mean. 
Scale bars, d, e, f, j,k, 100 jum; g, 50 pm. 
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that Apj and Cx40 mark cells before and after pre-artery specification, 
respectively. n= 415 cells. b, Schematic of lineage tracing experiments. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Effect of COUP-TF2 overexpression during 
coronary vessel development. a, Schematic of transgenes used to 

study Coup-tf2 overexpression in coronary cells. b, c, Recombination 

is not complete in the SV with tamoxifen at E9.5 and E10.5 as shown 

in whole-mount confocal images (b) and quantification (c). Control 

GFP is visualized by direct fluorescence, and COUP-TF2 through 
immunostaining for the myc tag. For c, ApjCreER Rosa”, n=5 hearts. 
ApjCreER Coup-tf2°", n=6 hearts. d, Tamoxifen dosing at E11.5 and 
E12.5 fills capillaries with recombined cells, but still resulted in Coup- 
1f2°F cells being excluded from arteries (A). e, Induction of Coup-tf2°" 
throughout vasculature shows that overexpressing cells can exist in 
arteries. f, Quantification of ventricle coverage at E12.5.n =4 control 
hearts, n=7 COUP-TF2 hearts. ns, P> 0.05. P=0.8868. g, Whole- 
mount confocal images of control and Coup-tf2™ hearts at different stages 
of development. Coronary migration (dotted line) on the dorsal side of the 
ventricle (outlined with solid line) is similar in both genotypes. 

h, High magnification of E12.5 Coup-tf2°" heart shown in g highlights the 
positioning of transgenic cells at both the leading front and trailing cells. 

i, COUP-TF2° cells can become part of the JAG-1-positive artery 

if induced after pre-artery specification with Cx40CreER. j, Mosaic 


experiment in which constitutive expression of the NOTCH intracellular 
domain (NICD) is induced at the same time as Coup-tf2°". This 
manipulation creates a vasculature containing three different transgene 
combinations: 1. NICD; 2. COUP-TF2; or 3. NICD + COUP-TF2°F 
(arrowheads). Those containing just the NICD (category 1) are the only 
transgenic cells that contribute to arterial vessels. k, Quantification of the 
percentage of endothelial cells in capillaries and arteries (Art) with the 
three transgenic combinations. NICD-expressing cells preferred arteries 
whereas COUP-TE2°X cells avoid arteries, the latter of which was not 
rescued by NICD. n=6 hearts. **P < 0.01; ****P < 0.0001. For NICD 
capillary versus artery, P=0.0070. For COUP-TF2° capillary versus 
artery, P=7.49224 x 10~°. For COUP'TF2° + NICD capillary versus 
artery, P= 8.07734 x 107°. 1, The CDK inhibitor flavopiridol increased 
arterial specification (Cx40) in an SV sprouting assay. n = 33 control 
explants, n = 38 treated explants. ***P < 0.001. m, Immunostaining 

of endothelial sprouts (VE-cadherin) migrating from SV or atria 

tissue explants with Cx40 showed the increase in this arterial marker 
(arrowheads) with flavopiridol treatment. Data shown as mean +s.d. A 
two-tailed unpaired t-test was performed to determine P values. Scale 
bars, b, 20 jum; d, e, g—j, 100 zm; m, 25 jum. 
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Extended Data Fig. 10 | Gene expression curves in E14.5 control and 
Coup-tf2° cells. a, FACS plots of the GFP-marked cells from control 

and Coup-tf2°" hearts that were processed for scCRNA-seq. b, Criteria for 
identifying Coup-tf2°" cells was >1 read of the flag and myc sequences 
included in the transgene. c, Comparing the number of flag and myc reads 
in control and Coup-tf2°" hearts confirms the specificity of this parameter 
for transgenic cells. Control: n = 409 cells. COUP-TF2°: n=714 cells. 
The centre line corresponds to the median; the upper and lower hinges 
correspond to the first and third quartile, respectively; the whiskers extend 


to the largest value or to 1.5 x IQR (inter-quartile range, or distance 
between quartiles), whichever is smaller. d, Expression of genes from the 
indicated categories along the vein-CV plexus-arterial axis. The x-axis 

has individual cells organized as shown in Fig. 5b. Lines are LOESS curves 
of gene expression and raw data points are shown as dots. Shaded region 
represents the 95% confidence interval of the LOESS curve. e, Hypoplastic 
coronary vasculature with heterozygous deletion of Coup-tf2 in endothelial 


cells. Scale bars, 100 1m. 
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Patients with prostate cancer frequently show resistance to androgen-deprivation therapy, a condition known as 
castration-resistant prostate cancer (CRPC). Acquiring a better understanding of the mechanisms that control the 
development of CRPC remains an unmet clinical need. The well-established dependency of cancer cells on the tumour 
microenvironment indicates that the microenvironment might control the emergence of CRPC. Here we identify 
IL-23 produced by myeloid-derived suppressor cells (MDSCs) as a driver of CRPC in mice and patients with CRPC. 
Mechanistically, IL- 23 secreted by MDSCs can activate the androgen receptor pathway in prostate tumour cells, promoting 
cell survival and proliferation in androgen- deprived conditions. Intra-tumour MDSC infiltration and IL-23 concentration 
are increased in blood and tumour samples from patients with CRPC. Antibody- mediated inactivation of IL-23 restored 
sensitivity to androgen- deprivation therapy in mice. Taken together, these results reveal that MDSCs promote CRPC by 
acting in a non-cell autonomous manner. Treatments that block IL-23 can oppose MDSC-mediated resistance to castration 


in prostate cancer and synergize with standard therapies. 


Prostate cancer is the most commonly diagnosed cancer in males in 
the world and the second leading cause of mortality in males that is 
attributable to cancer’. After it was shown that androgens and androgen 
receptor (AR) signalling promote prostate cancer progression, androgen- 
deprivation therapy (ADT) has become the main prostate cancer ther- 
apy for patients at different stages of disease”. However, a considera- 
ble fraction of patients receiving such treatments ultimately progress 
to a more aggressive disease, developing CRPC?. The prognosis for 
patients with CRPC remains poor and the treatment of these patients 
remains a major unmet medical need”-®. A better understanding of the 
mechanisms that drive CRPC could identify more effective therapies. 
Deregulated AR signalling, induced by genomic amplification of the 
AR locus, AR splice variants and activation of co-regulators of the AR, 
is considered the major determinant of CRPC?. Activation of several 
AR-alternative signalling pathways also promotes CRPC®!°. However, 
these mechanisms suggest that cell-autonomous alterations occur in 
prostate tumour cells and do not take into consideration that these cells 
are surrounded by a complex tumour microenvironment. The well- 
established dependency of cancer cells on the tumour microenviron- 
ment"! suggests that the non-cancer-cell component of the tumour may 
control prostate cancer progression, although the contribution of the 
tumour microenvironment, and in particular of the tumour immune 
response to the emergence of CRPC, remains unknown!*!3, We and 
others have previously reported that MDSCs are a prominent immune 
cell subset infiltrating the CRPC microenvironment’*"!*. MDSCs 
are a heterogeneous population of activated immune cells that are 
expanded in pathological conditions, including cancer, and that have 
potent immunosuppressing activity’’. On the basis of their expression 
markers, MDSCs can be classified into monocytic MDSCs or poly- 
morphonuclear (PMN)-MDSCs!*. Higher numbers of circulating 


and tumour-infiltrating MDSCs have been observed in a large frac- 
tion of patients who have different types of tumours including prostate 
cancer!”"!. MDSCs can support tumorigenesis by either suppressing 
the antitumour immune response or by promoting angiogenesis and 
senescence evasion in a number of contexts including prostate can- 
cer!>-1621_ MDSCs have been also found to be increased in patients that 
do not respond to ADT”°. However, whether MDSCs support andro- 
gen-independent tumour growth and the emergence of CRPC remains 
unknown. Here, we show that IL-23 secreted by increased numbers 
of MDSCs in both human and mouse prostate tumours can confer 
androgen independence in a non-cell autonomous manner through 
the activation of AR signalling. Inhibition of IL-23 or IL-23 receptor 
signalling in these tumours restores sensitivity to ADT. 


MDSCs confer castration resistance 

By analysing biopsies from patients with castration-sensitive prostate 
cancer (CSPC) and CRPC, we found that PMN-MDSCs (CD11b*CD33* 
CD15* cells)!8 were enriched in CRPC and localized in close prox- 
imity to EpCAM* epithelial tumour cells (Fig. la and Extended Data 
Fig. 1a). Notably, increased PMN-MDSCs in tumours were not asso- 
ciated with increased levels of CD11b‘CD15~ cells (Extended Data 
Fig. 1b). Therefore, we hypothesized that tumour infiltrating PMN- 
MDSCs could directly contribute to the emergence of CRPC. We 
investigated this hypothesis using the Pten-null prostate conditional 
knockout (Pten’“’-) mouse model and two additional allograft 
models of prostate cancer (TRAMP-C1 and MyC-CaP) that develop 
tumours driven by Pten loss, p53 and RB inactivation, and MYC ampli- 
fication, respectively”. As previously reported”, surgical castration of 
Pten’°~’~ mice leads to initial tumour regression (castration-sensitive 
phase, t= 4 weeks) followed by tumour progression and emergence 
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Fig. 1 | MDSCs infiltrate CRPC paralleling the activation of AR 
pathway and conferring resistance to castration to prostate cancer. 

a, CD11b*CD33+CD15* PMN-MDSCs within the tumours of patients 
with CSPC or CRPC. Left, representative images of CSPC and CRPC from 
patient 1. Right, Quantification. EpCAM, yellow; CD15, green; CD33, 
red; CD11b, pink; DAPI, blue. n=51 biologically independent patients 
per group reported; data are mean +s.e.m. Statistical analyses (negative 
binomial regression model): P< 0.001. b-d, Pten’©~’~ mice that were 
sham-operated (Sham) or surgically castrated (CTX) Pten?©-/~ mice at 
different time points. b, Tumour volume of the anterior prostate lobe. 

c, Quantitative PCR with reverse transcription (qRT-PCR) analyses of 
the indicated genes in the prostate tumours at t= 4 weeks (castration- 
sensitive phase (CS)) and t= 12 weeks (castration-resistant phase (CR)). 
d, Flow cytometry for tumour PMN-MDSCs (gated on CD45* cells). 

e, Percentages of tumour-infiltrating immune cell populations (gated 

on CD45* cells). f, Experimental schematic. CS-FBS, charcoal-stripped 


of castration-resistant prostate tumours (castration-resistant phase, 
t=12 weeks) (Fig. 1b and Extended Data Fig. 1c, d). AR target genes 
are downregulated in tumours in the castration-sensitive phase and 
upregulated in the castration-resistant phase in castrated mice com- 
pared to sham-operated mice (Fig. 1c). To assess whether castration 
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FBS. g, TRAMP-C1 cell proliferation. h, Percentage of annexin V~ and 
7-aminoactinomycin D (7AAD)-negative TRAMP-Cl cells. i, Percentage 
of annexin V* and 7AAD* TRAMP-C1 cells. j, RT-PCR analyses of the 
indicated genes in TRAMP-C1 cells. k, Experimental schematic. PMA, 
phorbol myristate acetate. 1, Proliferation of LNCaP cells. m, Volume 

of prostate tumours of CTX Pten’®~/~ mice treated with the CXCR2 
antagonist or untreated at the end of the study (12 weeks after CTX). 

n, qRT-PCR analyses of the indicated genes in the prostate tumours of 
mice treated as in m. Specific n values of biologically independent mice 
(b-d, m, n) and independent samples (g-j, 1) are shown and data are 
mean +s.e.m. b, d, h, i, 1, m, Statistical analyses (unpaired two-sided 
Student’s t-test): NS, not significant; *P < 0.05; **P< 0.01; ***P< 0.001. 
b, d, Statistical analyses between all groups and time points (two-sided 
one-way ANOVA): P< 0.001. ¢, g, j, n, Statistical analyses (paired two- 
sided Student’s t-test): *P < 0.05; **P < 0.01; ***P< 0.001. 


affects the recruitment of PMN-MDSCs in these tumours, we measured 
the frequency of PMN-MDSCs (CD11btLy6G>"8"'Ly6C!W cells)!* in 
sham-operated and castrated Pten?-’~ mice in a time course experi- 
ment. Notably, PMN-MDSCs number increased over time, paralleling 
the emergence of CRPC (Fig. 1b, d and Extended Data Figs. le, 2a). 
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Of note, PMN-MDSCs represented the major subset of immune cells 
that increased in Pten-null tumours upon castration (Fig. le and 
Extended Data Fig. 2b, c). This increase in PMN-MDSCs was validated 
in TRAMP-Cl1 and MyC-CaP castrated mice that develop CRPC within 
10 days after castration (Extended Data Figs. 1f-i, 2a). Whereas PMN- 
MDSCs increased in castrated tumours, the frequency of tumour- 
infiltrating macrophages (TAMs) decreased (Fig. le and Extended Data 
Fig. 2c). 

To assess whether factors secreted by MDSCs affect AR pathway 
signalling in prostate cancer cells, we co-cultured two mouse androgen- 
dependent prostate cancer cell lines, TRAMP-C1 and MyC-CaP, in 
the presence of conditioned medium obtained from bone marrow 
(BM)-derived MDSCs (Fig. 1fand Extended Data Fig. 3a, b). Notably, 
the conditioned medium of MDSCs sustained the proliferation and 
survival of cells cultured under full androgen deprivation (FAD), 
enhancing the transcription of AR target genes (Fig. 1g-j and Extended 
Data Fig. 3c-f). These results were further validated in both androgen- 
dependent (LNCaP and VCaP) and androgen-independent (22Rv1 
and PC3) human prostate cancer cell lines cultured in the presence of 
conditioned medium from human BM-MDSCs (Fig. 1k, | and Extended 
Data Fig. 3g-i). Taken together, these data demonstrated that MDSCs 
can regulate, in a paracrine manner, androgen-deprivation sensitivity 
in prostate tumour cells. We next assessed whether depletion of MDSCs 
could delay the emergence of CRPC in castrated mice. We therefore 
treated castrated Pten’®~/~ mice, TRAMP-C1 and MyC-CaP allograft 
mice with AZD5069, a selective CKCR2 antagonist that is under clinical 
evaluation (Clinical Trial NCT03177187, https://clinicaltrials.gov/ct2/ 
show/NCT03177187). Treatment with the CXCR2 antagonist strongly 
reduced the tumour infiltration of PMN-MDSCs in all of the mouse 
models that were analysed (Extended Data Figs. 3j 4a, e). Notably, 
whereas Pten’°~’~ castrated mice treated with the CXCR2 antago- 
nist did not progress to CRPC, untreated mice developed CRPC four 
months after castration as demonstrated by the levels of AR target genes 
(Fig. 1m, n). This finding was also confirmed in TRAMP-C]1 and MyC- 
CaP allograft mice, in which inhibition of MDSC recruitment in the 
tumour delayed the emergence of CRPC as shown by decreased tumour 
size and level of AR target genes in treated mice, resulting in longer sur- 
vival in mice that were treated with the CXCR2 antagonist (Extended 
Data Fig. 4a—h). Of note, treatment with the CXCR2 antagonist did 
not directly affect cell proliferation and AR activity in mouse prostate 
tumour cells cultured in vitro in FAD (Extended Data Fig. 3k, 1). To 
corroborate the role of MDSCs as drivers of CRPC in the human 
setting, we co-injected human BM-MDSCs with LNCaP cells in NOD/ 
SCID mice, and assessed tumour growth over time in intact versus 
castrated hosts. In line with our previous results, the co-injection of 
LNCaP with human BM-MDSCs conferred resistance to castration 
(Extended Data Fig. 4i, j). Taken together, these data indicated that 
MDSCs are increased in CRPC and can promote proliferation of 
prostate tumour cells by sustaining AR signalling following androgen 
deprivation. 


IL-23 drives insensitivity to androgen deprivation 

To determine which MDSC-secreted factors drive castration resist- 
ance, we performed a NanoString nCounter gene expression assay in 
Pten?°~/~ tumours from sham and castrated mice. IL-23 and one of the 
subunits of IL-23 receptor (IL12R@1) were the most upregulated genes 
in tumours from castrate mice compared to controls (Extended Data 
Fig. 5a). Of note, factors that had previously been linked to the regu- 
lation of AR pathway, such as IL-6, were not upregulated in Pten’°~/— 
tumours after castration!’ (Extended Data Fig. 5a). In line with this 
evidence, cytokine profile analysis of conditioned medium from mouse 
MDSCs showed that IL-23 was the most overexpressed factor produced 
by these cells (Extended Data Fig. 5b). Immunofluorescence and flow 
cytometry analyses further confirmed that tumour-infiltrating MDSCs 
expressed IL-23 in vivo, with PMN-MDSCs that infiltrated castration- 
resistant tumours expressing even higher levels of IL-23 compared to 
treatment-naive tumours (Fig. 2a, b). Moreover, similar to the results 
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Fig. 2 | Tumour-infiltrating MDSCs produce IL-23 that drives 
insensitivity to androgen deprivation. a—c, Prostate tumours from 
Pten?©~/~ sham-operated or CTX mice analysed at t= 12 weeks. 

a, Representative images of Ly6G*IL-23* cells (nuclei, blue) and 
representative dot plot of Ly6G*IL-23* cells gated on CD45* cells in 
CTX mice. b, c, Representative histograms (left) and quantification 
(right; mean + s.e.m.) showing the mean fluorescence intensity (MFI) 

of IL-23 in CD45*CD11b*Ly6G* cells (b) and MFI of IL-23R gated on 
CD45~EpCAMt cells (c). 1 =3 biologically independent mice per group. 
d, Representative images of IL-23+, CD15*, EpCAM¢ cells within the 
tumours of patients with CRPC. a, d, Data were validated in at least three 
experiments. e, CD33*IL-23+CD11b*CD45* cells within the tumours 
of patients with CSPC or CRPC (n= 4 biologically independent patients 
per cohort). f, IL-23 levels in the plasma of patients with CSPC (n= 20) 
and CRPC (n= 120). g, Correlation analyses of the numbers of tumour- 
infiltrating PMN-MDSCs and plasmatic IL-23 levels in patients with 
CRPC (n= 28). Statistical analyses (negative binomial regression model): 
P<0.001. h, i, Tumour progression (t= days after castration) of NSG mice 
carrying TRAMP-Cl allografts treated with isotype control (untreated; 
n= 4), anti-CSF1R antibody (n=5) or CXCR2 antagonist (n = 5). 

h, Average tumour volume. i, Survival curves reported as a Kaplan-Meier 
plot. Statistical analyses (log-rank (Mantel-Cox) test): **P < 0.01. 

j, TRAMP-C1 cell proliferation. k, Percentage of annexin V- and 7AAD~ 
TRAMP-CI cells. 1, Percentage of annexin Vt and 7AAD* TRAMP-Cl1 
cells. m, (RT-PCR analyses of the indicated genes in TRAMP-C1 cells. 
n, Cell proliferation of 3D cultures of reported organoids. Recombinant (r) 
IL-23 conditions were normalized to the none or FAD treatment. 

e, f, h, j-n, Data are mean + s.e.m. Specific n values of biologically 
independent samples are shown in j-n. b, c, e, f, h, j-I, n, Statistical 
analyses (unpaired two-sided Student's t-test). m, Statistical analyses 
(paired two-sided Student’s t-test): *P < 0.05; **P < 0.01; ***P< 0.001. 
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Fig. 3 | IL-23-IL-23R signalling axis regulates resistance to castration 
in prostate cancer in vivo and in vitro. a, Magnetic resonance imaging 
scans of representative Pten?°~/—1123a? and Pten?©~/~1123a®° mice at the 
endpoint (top). Waterfall plot depicting proportional change in tumour 
response for Pten?°~/~1123a" (n= 3) and Pten’©~/~1123a*° (n= 3) 

mice (bottom). Data are mean + s.e.m. Statistical analyses: unpaired 
two-sided Student's t-test: *P < 0.05; one-way ANOVA: P= 0.0008. 

b, Representative haematoxylin and eosin staining at the endpoint. 

Scale bars, 100|1m. Data are representative of at least three biologically 
independent mice. c, Quantification of adenocarcinoma, prostatic 
intraepithelial neoplasia (PIN) or normal-like glands in Pten?°~/~1123a"? 
(n=3) and Pten’©~/-1123a®° (n =3) mice. d, Quantification of Ki-67+ 
cells in Pten?©~/~1123a" (n= 4) and Pten’©~/~1123a*° (n = 6) mice. 

e, RT-PCR analyses of the prostate tumours. f, g, Tumour volume and 
survival curves of TRAMP-C1 1/23a? and TRAMP-C1 1/23a*° mice. 

f, Statistical analyses (unpaired two-sided Student’s t-test followed by 
Wilcoxon signed-rank test): *P < 0.05. h, RT-PCR analyses in the 
tumours of TRAMP-C1 1123a"7 or TRAMP-C1 1123a*° mice. 


found in mice, PMN-MDSCs that infiltrated tumour biopsies from 
patients with CRPC expressed IL-23 (Fig. 2d, e and Extended Data 
Fig. 6a). In addition, the frequency of IL-23-producing tumour- 
infiltrating PMN-MDSCs was higher in CRPC biopsies than in CSPC 
biopsies (Fig. 2e). Notably, expression of CKCL5, a chemokine that 
stimulates chemotaxis of myeloid cells through CKCR2”, was strongly 
upregulated in castrated tumours compared to controls (Extended Data 
Fig. 5a, c, d). This, together with the finding that CXCR2 inhibition 
efficiently decreases the recruitment of MDSCs in castrated mice, indi- 
cates that CXCL5 is a major regulator of MDSC recruitment in CRPC. 

We next assessed the levels of the IL-23 receptor (IL-23R) in tumours 
from sham and castrated Pten?©~’~ mice, and found that IL-23R levels 
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i, j, NSG males challenged with TRAMP-Cl or TRAMP-C1 1123r®° cells 
after CTX were treated with isotype control (untreated) or anti-IL-23 
antibody and monitored for tumour progression (i) and survival (j). 
Western blot of IL-23R in TRAMP-C1 (control) or TRAMP-C1 1/23r° 
cells is shown in the inset (performed at least twice). g, j, Statistical 
analyses (log-rank (Mantel-Cox) test): **P < 0.01; ***P < 0.001. 

k, Quantification of pSTAT3(Y705)* cells in Pten?°~’~1123a" (n= 4) and 
Pten’©~/~1123a*° (n =6) mice. 1, m, Western blot (1) and quantification (m) 
of RORy, pSTAT3(Y705) and total STAT3 levels in prostate tumours. 

n, TRAMP-C1 cell proliferation. 0, RT-PCR analyses in TRAMP-C1 
cells. c, d, k, Data are mean +s.e.m. of one tumour per mouse (mean of 
three sections per tumour, >3 fields per section). e-k, m-o, Data are 
reported as mean + s.e.m. Specific n values of biologically independent 
mice (i-k) and biological independent samples (m-o) are shown. 

c, d, i, k, m, n, Statistical analyses (unpaired two-sided Student's t-test). 
e, h, 0, Statistical analyses (paired two-sided Student's t-test): *P < 0.05; 
** P< 0.01; ***P< 0.001. 


increased in tumour cells following castration (Fig. 2c). This was fur- 
ther validated in TRAMP-C1 cells cultured in androgen-deprived 
conditions in vitro (Extended Data Fig. 5e, f). Furthermore, plasma 
levels of IL-23 in patients with CRPC were substantially higher than 
in patients with CSPC (Fig. 2f and Extended Data Fig. 5g) and sta- 
tistically correlated with tumour-infiltrating PMN-MDSC counts 
(EpCAM~CD11b*CD33*CD15* cells; Fig. 2g) but not with other 
myeloid cell population counts (CD11b*CD15~ cells; Extended Data 
Fig. 6b). Overall, these data demonstrate that IL-23 is increased in 
both mouse and human CRPCs, with IL-23 levels correlating with the 
number of tumour-infiltrating PMN-MDSCs. Overall, tumour biop- 
sies from patients with CRPC also had much higher IL23A (which 
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Fig. 4 | IL-23 inhibition improves ENZA efficacy in vivo. 

a, Pten?“~’~ mice with castration-resistant prostate cancer (12 weeks after 
castration) were randomly assigned to treatments in a preclinical study. 
Treatments: isotype control (untreated), anti-IL-23 antibody (100 ng per 
mouse injected intraperitoneally weekly), ENZA (30 mg kg”! per day 
administered daily by oral gavage on a Monday to Friday schedule) and 
ENZA in combination with anti-IL-23 antibody (ENZA + anti-IL-23). 

b, Histological score. n = 3 biologically independent mice. Statistical 
analyses (two-way ANOVA): P< 0.001. c, Fold increase in the volume of 
the anterior lobe of the prostate (fold change over the untreated group). 
d, Representative haematoxylin and eosin (H&E) and Ki-67 staining in 
the tumours at completion of the study. Scale bars, 50 1m. e, GRT-PCR 
analyses of the indicated genes in the prostate tumours of CTX Pten?©~/~ 
mice at completion of the preclinical study. Statistical analyses (two-sided 
paired Student's t-test): *P < 0.05; ***P < 0.001. f, Representative cleaved- 
caspase3 staining in the tumours after one week of treatments. Scale bars, 
50m. g, Quantification of cleaved-caspase 3 (percentage of the total 
number of cells within the glands). d, f, Data are representative of at least 
three biologically independent mice. b, g, Data are mean +s.e.m. of one 
tumour per mouse (mean of three sections per tumour, three or more 
fields per section). b, c, e, g, Data are mean +s.e.m. Specific n values of 
biological independent mice are shown in ¢, e, g. ¢, g, Statistical analyses 
(unpaired two-sided Student’s t-test): NS, not significant; *P < 0.05; 
**P < 0.01; ***P < 0.001. 


encodes IL-23) and IL23R mRNA levels than treatment-naive patients 
(Extended Data Fig. 6c, d) and IL23A mRNA levels were linked to 
MDSC-associated mRNA levels in biopsies from patients with CRPC 
(Extended Data Fig. 6e, f). 

In line with this evidence, the primary source of IL-23 in tumours 
from castrated Pten’°~/~ mice was the population of PMN-MDSCs 
(Extended Data Fig. 6g). Note that TAMs and epithelial tumour cells 
were only a small fraction of the total IL-23* cells in these tumours 
(Extended Data Fig. 6g). In keeping with this, IL-23 levels significantly 
decreased in tumours from castrated Pten’°/~ mice that were depleted 
of MDSCs (Extended Data Fig. 6h), whereas the depletion of TAMs by 
an CSF1R antibody” in mice with TRAMP-C] allografts did not delay 
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the emergence of CRPC. Conversely, the reduction of PMN-MDSCs by 
a CXCR2 antagonist robustly delayed the emergence of CRPC (Fig. 2h, i 
and Extended Data Fig. 6i). Overall, therefore, IL-23 levels significantly 
decreased in tumours depleted of MDSCs but not of TAMs (Extended 
Data Fig. 6)). 

To functionally validate these findings, we cultured prostate tumour 
cells in the presence of conditioned medium from BM-MDSCs from 
1123a wild-type (IL23“7 BM-MDSCs) or I123a knockout mice (IL 23" 
BM-MDSCs). The conditioned medium of IL23 WT BM-MDSCsz, as 
well as treatment with recombinant IL-23, promoted the prolifera- 
tion, survival and increased the transcription of AR target genes in 
prostate cancer cells kept in FAD, whereas the conditioned medium of 
IL23° BM-MDSCs was unable to affect these parameters (Fig. 2j-m). 
Of note, deleting IL-23 in BM-MDSCs did not affect the levels of 
other secreted factors in these cells (Extended Data Fig. 7a). Indeed, 
IL237 and IL23k° BM-MDSCs had equal immunosuppressive capa- 
bilities (Extended Data Fig. 7b). These results were further validated 
in a subset of androgen-dependent organoids that were derived from 
patient-derived xenografts and LNCaP cells kept in FAD and treated in 
the presence or absence of human recombinant IL-23 (Fig. 2n). Taken 
together, these findings identify IL-23 as an MDSC-secreted factor that 
can sustain the proliferation and survival of prostate cancer cells as well 
as the transcription of AR-driven target genes in prostate cancer cells 
cultured in FAD. 


MDSCs activate the IL-23-RORY pathway 

To determine whether MDSC-derived IL-23 promotes the emergence 
of resistance to castration in prostate cancers in vivo, we next recon- 
stituted lethally irradiated sham-operated or castrated Pten’©~’~ mice 
with bone marrow precursors from [23a or 1123a*° mice (yielding 
Pten’®’-1L23? and Pten?©~—1123a®° mice, respectively; Extended 
Data Fig. 8a). Mice were reconstituted with bone marrow that was 
deprived of T, B and natural killer (NK) cells (Extended Data Fig. 8b). 
The absence of IL-23 in the myeloid compartment led to a marked 
reduction in the tumour volume of the prostate cancers specifically 
in castrated Pten’°~/~ mice (Fig. 3a and Extended Data Fig. 8c), 
with normalization of glands that are affected by prostate cancer 
and a major reduction in Ki-67* cells (Fig. 3b-d and Extended Data 
Fig. 8d). Notably, AR target genes were robustly downregulated in 
prostate tumours from Pten?’©~/—1123a*° compared to tumours from 
Pten?©/-1123a"" mice (Fig. 3e). These data were then also validated 
in TRAMP-C1-allograft irradiated mice reconstituted with 1123a"7 
and 1123a*° bone marrow precursors (yielding TRAMP-C1 1123a? 
and TRAMP-C1 1123a*° mice; Extended Data Fig. 9a). In TRAMP-C1 
1123a*° mice, the absence of IL-23 in the myeloid compartment signif- 
icantly delayed the emergence of CRPC as demonstrated by decreased 
tumour size and tumour cell proliferation as well as reduced AR-driven 
target gene expression and significantly increased survival of the 
TRAMP-C1 123a®° mice (Fig. 3f-h and Extended Data Fig. 9b, c). 
Reconstitution with 1123a*° bone marrow did not alter the recruit- 
ment of MDSCs into the tumours and spleens of reconstituted mice 
(Extended Data Fig. 9d, e). Of note, treatment with anti-IL-23 anti- 
bodies or genetic inactivation of IL-23R in NOD/SCID/y (NSG) 
TRAMP-C1 allografts confirmed these results (Fig. 3i, j), demonstrat- 
ing that IL-23 directly promotes resistance to castration in prostate 
cancer by binding to IL-23R that are present on tumour cells. 

IL-23 has been reported to regulate the activation of STAT3-RORy 
expression in naive CD4 T cells’; both STAT3 and ROR can affect 
AR signalling in prostate cancer”*”?. We therefore evaluated whether 
IL-23 secreted by MDSCs affected the STAT3-ROR) signalling axis 
in prostate cancer in a non-cell autonomous manner. Inactivation of 
IL-23 in the myeloid compartment of castrated Pten?©’~ mice signifi- 
cantly decreased overall tumour levels of phosphorylated (p)STAT3 and 
ROR) in vivo (Fig. 3k-m and Extended Data Fig. 8d); this was also the 
case in the TRAMP-C1 model (Extended Data Fig. 9f-h). RORY inhi- 
bition in vitro also abrogated the proliferative advantage conferred by 
conditioned medium from MDSCs or IL-23 treatment in TRAMP-C1 
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cells kept in FAD, and inhibited the transcription of the full-length form 
of the AR and its constitutively active splice variant (ARv4) as well as 
downstream AR target genes (Fig. 3n, 0). Taken together, these data 
demonstrate that IL-23 released by MDSCs into the tumour microenvi- 
ronment acts directly on the pSTAT3—ROR) signalling axis to drive the 
transcription of AR and its splice variant and downstream target genes, 
thus favouring the proliferation and survival of the prostate cancer cells 
in androgen-ablation conditions. 


IL-23 targeting improves the efficacy of ADT 

To evaluate the therapeutic relevance of our findings, we next assessed 
whether IL-23 inhibition by antibody blockade could reverse resistance 
to castration in prostate cancer in Pten’°~’~ mice. Because anti-IL-23 
antibodies are currently being evaluated in clinical trials for the treat- 
ment of autoimmune diseases*”, including psoriasis and are clinically 
well-tolerated*!, we treated mice carrying Pten’°~/~ tumours that had 
become resistant to surgical castration with an anti-IL-23 antibody in 
combination with the AR antagonist enzalutamide (ENZA; Fig. 4a). 
ENZA is a standard treatment for patients with CRPC after primary 
ADT?*”. Our preclinical study showed that anti-IL-23 increased the 
efficacy of ENZA (Fig. 4b, c); in mice treated with anti-IL-23 and 
ENZA, we observed a normalization of prostate glands that were 
affected by cancer (Fig. 4b and Extended Data Fig. 10a), with decreased 
tumour volume (Fig. 4c) and proliferation (Extended Data Fig. 10a, b), 
whereas in mice that were treated with ENZA, alone the treatment was 
ineffective. Combined anti-IL-23 and ENZA were associated with a 
robust inhibition of the AR activity and induction of apoptosis of the 
tumour cells (Fig. 4e-g). Taken together, these data demonstrate that 
anti-IL-23 treatment can reverse resistance to castration in prostate 
cancer and enhance the efficacy of ENZA. 


Discussion 

Our study has identified IL-23 production by MDSCs as a driver of 
CRPC and adds novel mechanistic insights on how prostate cancers 
can become insensitive to androgen deprivation and AR blockade. 
We also report on a different role for MDSCs in cancer, describing an 
unexpected function for this immune subset. Previous data demon- 
strated that MDSCs can support tumorigenesis in many cancers 
through different mechanisms!*!”!8, with preclinical studies indi- 
cating that the inactivation of MDSCs increased immune-checkpoint 
blockade efficacy in CRPC models!®. IL-23 has also been previously 
implicated in cancer progression in the context of a different tumour 
type as a regulator of the pro-tumour immune response*>*°. However, 
to our knowledge, the discovery described here, that IL-23 produced 
by MDSCs regulates resistance to castration in prostate cancer by 
sustaining AR signalling, was previously unknown, and adds novel 
mechanistic insights on how these immune cells support tumorigen- 
esis. This work also shows that inhibition of IL-23 can reverse ADT 
resistance in men suffering from advanced prostate cancer (Extended 
Data Fig. 10c). 

In conclusion, we describe an alternative immunotherapeutic strat- 
egy for treating advanced prostate cancer that, unlike most other 
treatments, is not focused on re-activating the function of cytotoxic T 
lymphocytes against tumour cells. Immunotherapeutic strategies that 
reactivate cytotoxic T cells by immune-checkpoint blockade have been, 
to date, only active against a small subset of prostate cancers that are 
characterized by DNA-repair defects and higher neoantigen loads with 
increased infiltration of T lymphocytes***”. Our results demonstrate, 
on the other hand, that MDSCs are a major player in the endocrine 
resistance in prostate cancer and that immunotherapies that target the 
blockade of either MDSC recruitment into tumour, or the direct inhi- 
bition of IL-23 can be effective therapeutic strategies for patients that 
have these lethal and common diseases. Because anti-IL-23 antibodies 
have been well-tolerated in clinical trials involving patients with auto- 
immune diseases”, these deserve to be clinically evaluated in men that 
have lethal prostate cancer. We envision that this immunotherapeutic 
strategy targeting paracrine IL-23 in combination with established 
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endocrine anticancer treatments is highly likely to improve treatment 
outcome for this common male cancers. 
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Any Methods, including any statements of data availability and Nature Research 
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METHODS 


Animals. All mice were maintained under specific pathogen-free con- 
ditions in the IRB facility and experiments were performed according to 
national guidelines and regulations. All animal experiments were approved 
by the local ethics committee (TI13/2015 and T104/2017). Male C57BL/6, 
FVB, NSG, NOD/SCID mice 6-8 weeks of age were purchased from Jackson 
Laboratories (Envigo) and acclimatized for at least a week before experiments. 
Male C57BL/6 IL-23p19KO (I123a*°) mice’ were provided by F. Sallusto 
(IRB, Bellinzona) and used at 8 weeks of age. Male Pten?°-/~ mice were gen- 
erated and genotyped as previously described!”. Female Pten'*?/'*P mice were 
crossed with male PB‘ transgenic mice and genotyped for Cre using the fol- 
lowing primers: primer 1 (5'’-AAAAGTTCCCCTGCTGATGATTTGT-3’) and 
primer 2 (5'-TGTTTTTGACCAATTAAAGTAGGCTGTG-3’) for PTEN?/*?; 
primer 1 (5’-TGATGGACATGTTCAGGGATC-3’) and primer 2 (5’-CAGCCACC 
AGCTTGCATGA-3’) for PB. Surgical castration was performed under anaes- 
thesia with isoflurane. Male Pten’°~/~ mice were 9-10 weeks old at the time of 
castration. Mice were monitored postoperatively for recovery from anaesthesia 
and checked daily for four days postoperatively. Surgical skin clips were removed 
on postoperative day 5. Mice undergoing treatment were administered control 
vehicle or therapeutic doses of the appropriate agents. Any mouse that showed 
signs of distress or lost more than 15% of their initial weight during treatment was 
euthanized by CO) asphyxiation. At the completion of study, mice were euthanized 
by CO, asphyxiation and tissue was collected for histology, mRNA and protein 
analysis and single-cell suspensions for flow cytometry. For allograft experiments, 
2.5 x 10° TRAMP-CI cells, 2.5 x 10° TRAMP-C1 1123r®° cells or 2 x 10° MyC-CaP 
cells were injected subcutaneously into the flank of male C57BL/6, C57BL/6 or FVB 
mice, respectively. For xenograft experiments 3 x 10° LNCaP cells were suspended 
with or without 3 x 10° human BM-MDSCs in a total volume of 10011 PBS and 
Matrigel (1:1) and implanted subcutaneously into the flank of NOD/SCID mice. 
When tumours were approximately 100 mm}, mice were randomized to the treat- 
ment groups. Tumour growth was monitored daily by measuring the tumour size 
with calliper. The tumour volume was estimated by calculating 4/3(R x R2 x R3), 
where R; and R; are the longitudinal and lateral radii and R; is the thickness of the 
tumour that protrudes from the surface of normal skin. Animals were euthanized 
when the tumour reached approximately 600 mm’. The local ethics committee 
approved the conduction of the in vivo experiments with maximal tumour sizes 
of 1,000mm’. 

Treatments. The CXCR2 antagonist (AZD5069; Astrazeneca) was administered 
with daily intraperitoneal injections at a final concentration of 100mg kg! ona 
Monday through Friday schedule. Control animals received vehicle. Enzalutamide 
(APExBio) was administered daily by oral gavage with a dose of 30 mg kg“! per 
day on a Monday through Friday schedule. Rat anti-IL-23 antibody (100 ng per 
mouse; G23-8; IgG], kappa; eBioscience) or rat IgG] isotype control (eBioscience) 
was administered weekly via intraperitoneal injection. For in vivo depletion of 
macrophages, mice were treated with 400 1g anti-CSF1R (clone BE0213, BioXCell; 
on Mondays, Wednesdays and Fridays). 

Cell lines. The TRAMP-C1, MYC-CaP, LNCaP, VCaP, 22Rv1 and PC3 cell lines 
were obtained from the ATCC and no other authentication method was performed. 
The TRAMP-C1 1123r*° cell line was generated in the laboratory with CRISPR- 
Cas9 methodology and authenticated by western blotting and FACS for the dele- 
tion of IL-23R. All cell lines were regularly tested for mycoplasma (MycoAlert 
Mycoplasma Detection kit). 

Bone marrow reconstitution. Bone marrow was flushed from the femurs of male 
C57BL/6 or IL-23p19ko mice under sterile conditions with RPMI 1640 using a 
21-gauge needle. Mononuclear cells were filtered, collected and checked for via- 
bility using trypan blue. Before transplantation, bone marrow-derived cells were 
depleted of CD3* T cells, NK1.1* NK cells and CD19" B cells by magnetic bead 
separation (STEMCELL Technologies). Recipient C57BL/6 or Pten?°~/~ mice were 
lethally irradiated (900 rad) and transplanted intravenously 2h later with 10” viable 
bone marrow cells from either C57BL/6 or IL-23p19ko mice. For TRAMP-C1 
allografts, the animals were challenged subcutaneously with TRAMP-C] cells upon 
bone marrow engraftment. When tumours reached approximately 100 mm}, mice 
were surgically castrated and monitored for tumour progression. 

Magnetic resonance imaging. Magnetic resonance imaging (MRI) was performed 
on castrated Pten’°~’~ mice 0, 4, 8, 12 and 16 weeks after surgical castration or 
on CTX Pten?’©/—-1123a" and CTX Pten’©~—1123a®® mice 4, 8, 12 and 16 weeks 
after surgical castration using a 7T preclinical magnetic resonance scanner (Bruker, 
BioSpec 70/30 USR, Paravision 5.1) equipped with 450/675 mT/m gradients 
(slew-rate: 3400-4500 T/m/s; rise-time 1401s) and a mouse body volume coil. 
Mice were under general anaesthesia by 1.5-2% isoflurane vaporized in 100% 
oxygen (flow: 1] min~'). Breathing and body temperature were monitored (SA 
Instruments, Inc.) and maintained around 30 breaths per minute and 37°C, respec- 
tively. MRI studies included a Rapid Acquisition with Relaxation Enhancement 
(RARE) High-Resolution T2-weighted (T2w) sequence with fat suppression 


acquired in the axial plane (TR = 3,800 ms, TE= 45 ms, FOV =27 mm x 18mm, 
spatial resolution = 0.094 x 0.087 mm’ per pixel, scan time = 8 min, thick- 
ness = 0.70 mm, 26 slices) and in the coronal plane (TR=3,500 ms, TE= 38 ms, 
FOV =33mm x 33mm, spatial resolution = 0.129 x 0,129 mm’ per pixel, scan 
time =5 min, thickness = 0.60 mm, 20 slices). Images were analysed using NIH 
software MIPAV (version 7.4.0). The circumference of the whole prostate was 
outlined on each RARE T2w axial slice containing identifiable prostate and the 
number of bounded pixels in each slice was computed and added to yield the 
prostate volume. Coronal T2w images were used for the accurate identification of 
the basal and apical limits of the prostate. 
Differentiation of BM-MDSCs in vitro. Mouse BM-MDSCs were differentiated 
in vitro as previously described**. In brief, bone marrow precursors were flushed 
from the femurs of C57BL/6 or IL-23p19ko mice with RPMI 1640 medium. 
The cell pellet was resuspended (one femur in 10 ml) in RPMI 1640 containing 
10% heat-inactivated FBS and the cells were cultured in vitro in the presence of 
40ng ml"! GM-CSF and 40 ng ml! IL-6. On day 4, the cells were washed and 
resuspended with RPMI 1640 containing 10% heat-inactivated charcoal-stripped 
FBS. The day after the cells were stimulated with PMA and ionomycin and after 4h 
the supernatant was collected and stored at —80°C. Analysis of soluble molecules 
was conducted with Mouse CytokineMAP B version 1.0 (Rules Based Medicine). 
Human BM-MDSCs were differentiated in vitro by seeding 10° per ml bone 
marrow precursors in T25 flasks with RPMI 1640 containing 10% heat-inactivated 
FBS in the presence of 10ng ml~! GM-CSF and 10ng ml”! IL-6 for seven days”. 
Complete medium was changed when required. After seven days, the cells were 
analysed by flow cytometry for CD11b, CD33, CD15, HLA-DR expression and 
when the CD11b*CD33*CD15+tHLA-DR™ population was higher than 80%, 
the cells were re-suspended in RPMI 1640 containing 10% heat-inactivated 
charcoal-stripped FBS and after one day stimulated with PMA and ionomycin for 
5h. The supernatant was then collected and stored at —80°C. 
In vitro culture experiments. Prostate cancer cell lines were starved in charcoal- 
stripped FBS medium for 72h and then cultured with RPMI 1640 containing 
10% heat-inactivated FBS (normal medium) or kept in full androgen-deprivation 
medium (FAD; RPMI 1640 containing 10% heat-inactivated charcoal-stripped FBS 
plus ENZA 101M). Then, the cells were stimulated with or without conditioned 
medium obtained from activated BM-MDSCs, or recombinant IL-23 (100 ng ml}; 
R&D Systems), with or without RORY antibody (541M; $R2211, Calbiochem). Then 
the cells were analysed using the crystal violet assay (after 72h of culture, fold 
change over the FAD condition), stained with annexin V and 7AAD (after 72h 
of culture) or collected for RNA extraction (after 24h of culture; fold change over 
the FAD condition). 
Analyses of IL23A and IL23R mRNA expression in clinical tumours. CSPC 
RNA-sequencing data for 550 patients was downloaded from the UCSC Cancer 
Browser (https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/). Metastatic cas- 
tration resistant prostate cancer (mCRPC) RNA-sequencing data for 122 mCRPC 
patients was generated as part of the SU2C effort’. The paired-end transcriptome 
sequencing reads were aligned to the human reference genome (GRCh37/hg19) 
using TopHat2"! (version 2.1.0). Gene expression, as fragments per kilobase of exon 
per million fragments mapped (FPKM; normalized measure of gene expression), 
was calculated using Cufflinks*. MDSC marker (CD11b, CD33, CD14 and CD15) 
positive and negative was defined by the high and low quantiles of RNA expres- 
sion of each transcript and JL23A and IL23R expression level in each biomarker 
groups were compared by Student's t-test. In order to compare gene expression level 
between TCGA and SU2C with minimum experimental bias, we included only 
genes expressed in both TCGA and SU2C with median expression level (FPKM) 
>0. The gene expression levels in each sample were quantile-normalized, and 
IL23A expression levels in CSPC and CRPC were compared using a Student's f-test. 
Human organoids. Organoids were grown in 3D Matrigel (356231, Corning) 
under prostate epithelial conditions”. Cell viability was measured using 3D 
CellTiter-Glo 3D reagent (G9681, Promega) by quantifying metabolically active 
cells releasing ATP. Cell line-derived organoids were plated at a density of 2,000 
cells per well in 96-well optical plates (3610, Corning) embedded in Matrigel as 
hanging drops (511 per well). Cells were treated with recombinant IL-23 (300-014, 
PeproTech) at 100 ng ml“! or culture with ENZA (101M) with or without recom- 
binant IL-23. Luminescence measurements were performed after seven days in 
culture. Each IL-23 condition was normalized to its experimental control. 
Characterization of the immune tumour microenvironment. Tumours were dis- 
aggregated and digested in collagenase D and DNase for 30 min at 37°C to obtain 
a single-cell suspension. For intracellular cytokine detection, cells were stimulated 
for 5h with PMA and ionomycin plus Golgi Plug. After neutralization of unspe- 
cific binding with a CD16/CD32 antibody (clone 93), single-cell suspensions were 
stained with specific monoclonal antibodies (primary antibodies directly conju- 
gated) to assess the phenotype and diluted 1:200. The antibodies used were: CD45 
(clone 30-F11, lot no. B235438); Ly-6G (clone 1A8, lot no. B194432); Ly6C (clone 
HK1.4, lot no. B243043), CD11b (clone M1/70, lot no. B233927); F4/80 (clone 
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BM§8, lot no. 4305911), CD206 (clone C068C2, lot no. B230155), CD11c (clone 
N418, lot no. B226270), B220 (clone RA3-6B2, lot no. B210434), CD3 (clone 145- 
2C11, lot no. B241616), CD8 (clone 53-6.7, lot no. B193838), CD4 (clone GK1.5, 
lot no. B240053), NK1.1 (clone PK136, lot no. 4291566), CD90.2 (clone 30-H12, lot 
no. B190542), PDLI1 (clone 10F.9G2, lot no. B191993), EpCAM (clone G8.8, lot no. 
B230070), pan-cytokeratin (clone C11, Lot. 4528S), IL-17 (clone TC11-18H10.1, 
lot no. B201753), IL-23p19 (clone FC23CPG, lot no. 4321359), isotype (rat IgG1, 
kappa, eBRG1) IL-23R (clone 12B2B64, lot no. 4321359). For flow gating, we used 
isotype controls of fluorescence minus one controls. All the antibodies were pur- 
chased from eBioscience or Biolegend. Samples were acquired on a BD Fortessa 
flow cytometer (BD Biosciences). Data were analysed using FlowJo software 
(TreeStar). 

Immunohistochemistry and immunofluorescence of mouse tissues. For 
immunohistochemistry, tissues were fixed in 10% formalin (5701, ThermoScientific) 
and embedded in paraffin in accordance with standard procedures. Preceding 
immunohistochemical staining, tumour sections (41m) were exposed to two 
washes with OTTIX plus solution (X0076, Diapath) and subsequent hydration with 
OTTIX shaper solution (X0096, Diapath) followed by deionized water. Antigen 
unmasking was performed by heating sections in the respective pH solutions based 
on the antibodies used at 98°C for 20 min. Subsequently the sections were blocked 
for peroxidases and nonspecific binding of antibodies using 3% HO) (23615.248, 
VWR) and Protein-Block solution (X0909, DAKO Agilent technologies), respec- 
tively, for 10 min each, split by 0.5% PBST washing. Haematoxylin and eosin stain- 
ing was performed according to standard procedures. Sections were stained for 
anti-Ki-67 (clone SP6; Laboratory Vision Corporation), anti-pSTAT3 (Tyr705; 
clone D3A7; Cell Signaling). Images were obtained using objectives of 5x, 10x and 
40x magnification and pixel images of 2.24 1m, 1.12|1m and 0.28 1m, respectively. 
All the quantifications were done using the public online software ImmunoRatio 
(http://153.1.200.58:8080/immunoratio/). For the immunofluorescence staining, 
tissue paraffin-embedded sections were stained with 4’,6-diamidine-2’-phenylin- 
dole dihydrochloride (DAPI) (70238421, Roche), anti-IL-23 (ab45420, Abcam), 
anti-Ly6G (RB6-8C5, GeneTex). Confocal images were obtained with the Leica 
TCS SP5 confocal microscope using a 10x/1.25 NA oil objective. 

In vitro T cell suppression assay. In vitro suppression assays were carried out in 
RPMI with 10% FCS in 96-well U-bottom plates (Corning). Naive splenocytes were 
labelled with 511M CFSE (Molecular Probes) and activated in vitro with anti-CD3 
and anti-CD28 beads (Invitrogen) according to the manufacturer’s instructions. 
Conditioned medium from BM-MDSCs was added to the culture. After three 
days, the proliferation of CFSE-labelled CD8* T cells was analysed by BD Fortessa. 
CRISPR-Cas9 transfection. TRAMP-Cl cells were grown in 75-cm* flask to 
50-60% confluency in DMEM medium supplemented with 10% heat-inactivated 
FBS, 100U ml penicillin, 0.1 mg ml“! streptomycin and 2 mM t-glutamine. The 
transfection of the I/23r CRISPR-Cas9 KO plasmid (Santa Cruz Biotechnology) 
was performed using jetPRIME transfection reagent according to the manufac- 
turer’s protocol at the 1:2 DNA:jetPRIME ratio. After 24h of transfection, GFP- 
transduced cells were sorted to 99% purity and single cells were plated in 96-well 
plates. At day 7 after sorting, the grown cell clones were moved into 24-well plates 
for further expansion. The knockdown of the /23r gene in each cell colony was 
confirmed by western blot. 

NanoString. The nCounter analysis system (NanoString Technologies) was used to 
screen for the expression of signature genes associated with cancer-inflammation 
pathways. Two specific probes (capture and reporter) for each gene of interest 
were used. In brief, 511 of RNA (the concentration is higher than 25 ng il!) was 
hybridized with customized Reporter CodeSet and Capture ProbeSet as Mouse 
PanCancer Immune Profiling Panel including 700 selected genes (NanoString 
Technologies) according to the manufacturer's instructions for direct labelling of 
mRNAs of interest with molecular barcodes without the use of reverse transcrip- 
tion or amplification. Total RNA was quantified by NanoDrop ND-1000 Spectro- 
photometer (NanoDrop Technologies, Wilming-ton, DE) and RNA quality was 
assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies). The hybrid- 
ized samples were then recovered in the NanoString Prep Station and the mRNA 
molecules were counted with the NanoString nCounter. For expression analysis, 
each sample profile was normalized to geometric mean of 20 housekeeping genes 
included in the panel. 

Immune tumour microenvironment characterization of tumours from patients 
with prostate cancer. Tumours were disaggregated and digested in collagenase I 
and DNase for 20 min at 37 °C to obtain single-cell suspensions. For intracellular 
cytokine detection, cells were stimulated for 5h with PMA and ionomycin plus Golgi 
Plug. Single-cell suspensions were stained with specific monoclonal antibodies 
diluted 1:200 (primary antibodies directly conjugated) to assess the phenotype. 
The antibodies used were: CD45RA (clone MEM-56, 1:50), CD33 (clone WM53), 
CD11b (clone ICRF44), CD15 (clone W6D3), HLA-DR (clone L243), IL-23p19 
(clone 23DCDP). For flow gating, we used isotype controls of fluorescence minus 
one controls. All antibodies were purchased from eBioscience or Biolegend. 
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Samples were acquired on a BD Fortessa flow cytometer (BD Biosciences). Data 
were analysed using FlowJo software (TreeStar. 

Protein profiling. Plasma pools of patients with CSPC or CRPC were processed 
as indicated in the Human XL Cytokine Array Kit (R&D Systems). Pools of tissue 
lysates of tumours from sham and castrated Pten?°~/~ mice were processed as 
indicated in the Mouse XL Cytokine Array Kit (R&D Systems). Developed films 
were scanned, the obtained images were analysed using Image] version 1.43u and 
background signals were subtracted from the experimental values. 

Multiplex immunofluorescence in formalin-fixed paraffin-embedded tis- 
sue section. PMN-MDSC panel (CD15, CD11b, CD33 and EpCAM). Multiplex 
immunofluorescence for CD15 (M3631, Dako, clone Carb-3), CD33 (ab11032, 
Abcam, clone 6C5/2), CD11b (ab52477, Abcam, clone EP1345Y) and Ep>CAM 
Alexa Fluor 647 conjugate (5447S, Cell Signaling, clone VU1D9) was performed 
using 4-j1m sections of formalin-fixed paraffin-embedded (FFPE) prostate tumour 
samples by sequential staining after antigen retrieval in CC1 (pH 8.5) (950-224, 
Ventana) in a water bath at 98°C for 36 min. First, mouse monoclonal (IgG1) 
antibody anti-CD33 (1:100 dilution), mouse monoclonal (IgM) anti-CD15 (1:200 
dilution) and rabbit monoclonal (IgG) antibody anti-CD11b (1:100 dilution) 
were incubated for 1h after blocking with 10% goat serum for 30 min. Slides were 
then incubated with goat anti-mouse IgG1 Alexa Fluor 555-conjugated (A21127, 
Life Technologies), goat anti-mouse IgM Alexa Fluor 488-conjugated (A21042, 
Life Technologies) and goat anti-rabbit IgG (H+L) Alexa Fluor 700-conjugated 
(A21038, Life Technologies) antibodies for 30 min. Next, tissue sections were 
treated with 5% mouse or rabbit normal serum for 30 min, followed by incubation 
with a mouse monoclonal (IgG1) anti-EpCAM antibody conjugated to Alexa Fluor 
647 (dilution, 1:200) for 1h. The samples were washed three times for 5 min with 
TBS Tween 0.05% between incubations. Nuclei were counterstained with DAPI 
(70238421, Roche) and tissue sections were mounted with ProLong Gold antifade 
reagent (P36930, Molecular Probes). 

CD15, IL-23 and EpCAM. Immunofluorescence was performed on 4-|1m FFPE 
tissue sections using an automated staining platform (Bond-RX, Leica 
Microsystems). In brief, antigen retrieval was achieved using ER1 (pH 6.0) 
(AR9961, Leica Biosystems) for 30 min. Sections were blocked in 10% normal 
goat serum for 30 min at room temperature. Primary antibodies, including mouse 
monoclonal (IgM) anti-CD15 (M3631, Dako, clone Carb-3, dilution 1:200), rabbit 
monoclonal (IgG) anti-IL-23 (ab190356, Abcam, clone EPR5585(N), dilution 
1:100) and mouse monoclonal (IgG1) anti-EpCAM (29298, Cell Signaling, clone 
VUID%4, dilution 1:500), were incubated for 1h. Slides were then incubated with 
goat anti-rabbit (H+L) Alexa Fluor 555-conjugated (A21429, Life Technologies), 
goat anti-mouse IgM Alexa Fluor 488-conjugated (A21042, Life Technologies) and 
goat anti-mouse IgG1 Alexa Fluor 647-conjugated (A21240, Life Technologies) 
antibodies for 30 min. Nuclei were counterstained with DAPI (70238421, Roche) 
and tissue sections were mounted with ProLong Gold antifade reagent (P36930, 
Molecular Probes). 

Microscopy and image acquisition. After staining, slides were scanned using the 
multi-spectral camera provided by Vectra (Perkin Elmer) system. The number of 
images collected per case was dependent on tumour size from minimum of 1 toa 
maximum of 18 (average = 12). Quantification of PMN-MDSC-like cells (CD15+ 
CD33*+CD11b*) was performed using inForm v.2.1.1 software (PerkinElmer) and 
the density of cells of interest are presented as the number of cells per mm”. A tissue 
segmentation algorithm based on EpCAM positivity was used to separate tumour 
from adjacent stroma. The algorithm was trained to perform cell segmentation 
using counterstaining-based segmentation achieved with nuclear DAPI staining. 
Phenotype determination was based on positivity for CD15, CD33 and CD11b. 
Cells in tumour areas selected by the algorithm were then separated into bins 
as follows: CD15*CD33*CD11b* cells were called PMN-MDSC-like cells and 
CD15~CD11b* cells were called CD15~CD11b* cells. All tissue segmentation, cell 
segmentation and phenotype determination maps were reviewed by a pathologist. 
Validation of antibody specificity for multiplex immunofluorescence. 
Immunohistochemistry was performed on 4-\1m FFPE tissue sections using an 
automated staining platform (Bond-RX, Leica Microsystems). Optimal antibody 
concentrations were determined for primary antibodies against CD15 (M3631, 
Dako, clone Carb-3, dilution 1:200), CD33 (ab11032, Abcam, clone 6C5/2, dilution 
1:100), CD11b (ab52477, Abcam, clone EP1345Y, dilution 1:100) IL-23 (ab190356, 
Abcam, clone EPR5585(N), dilution 1:100) and EpCAM (29298, Cell Signaling, 
clone VU1D9, dilution 1:500). Antibody labelling was detected with the Bond 
Polymer Refine Detection Kit (DS9800, Leica Microsystems). 3,3-diaminoben- 
zidine tetrahydroxychloride (DAB) was used as chromogen and the slides were 
counterstained with haematoxylin. Human control samples included colorectal 
specimens. In each staining batch, positive and negative controls were incubated 
with and without primary antibody. 

RNA expression and qPCR. RNA isolation (TRIzol, Qiagen) and retro-transcription 
with SuperScriptIII (Invitrogen, 11752-250) were performed according to the 
manufacturer’s instructions. qPCR reactions (Bio-Rad) were performed using 
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KAPA SYBR FAST qPCR green (KK4605; Applied Biosystems) and the specific 
primers reported below. Primer sequences were obtained from PrimerBank (http:// 
pga.mgh.harvard.edu/primerbank/index.html) or Bio-Rad. Each expression value 
was normalized to HPRT or GADPH level as reference. The primer sequences 
used were as follows: CXCL1 forward, 5/-CTGGGATTCACCTCAAGAACATC-3’; 
reverse, 5/-CAGGGTCAAGGCAAGCCTC-3’. CXCL2 forward, 5’-GCGTCACAC 
TCAAGCTCTG-3’; reverse, 5’-CCAACCACCAGGCTACAGG-3’; CXCL3 
forward, 5’-ATCCCCCATGGTTCAGAAA-3’; reverse, 5’-ACCCTGCAGGAAG 
TGTCAAT-3’; CXCL5 forward, 5’-GTTCCATCTCGCCATTCATGC-3’; 
reverse, 5’-GCGGCTATGACTGAGGAAGG-3’. GAPDH forward, 5’-AGGT 
CGGTGTGAACGGATT-3’; reverse, 5’-TGTAGACCATGTAGTTGAG-3’. 
IL23p19 forward, 5‘-CCAGCAGCTCTCTCGGAATC-3’; reverse, 5/-TCATATG 
TCCCGCTGGTGC-3’. Bio-Rad primers used were: Hprt PrimePCR PreAmp 
for SYBR Green Assay: Hprt, mouse qM@muCID0005679; Ar PrimePCR PreAmp for 
SYBR Green Assay: Ar, mouse qM@muCID0005164; Nkx3-1 PrimePCR PreAmp 
for SYBR Green Assay: Nkx3-1, mouse qMmuCED0046482; Pbsn PrimePCR 
PreAmp for SYBR Green Assay: Pbsn, mouse qMmuCID0017831; Fkbp5 
PrimePCR PreAmp for SYBR Green Assay: Fkbp5, msouse qM@muCID0023283. 
Western blot analyses and protein detection. Tissue and cell lysates were pre- 
pared with RIPA buffer (1x PBS, 1% Nonidet P40, 0.5% sodium deoxycholate, 
0.1% SDS and protease inhibitor cocktail; Roche). The total protein concentration 
was measured using a BCA Protein Assay Kit (23225; Pierce). Equal amounts of 
proteins were separated by SDS-PAGE and western blotted onto a 0.45-|1m nitro- 
cellulose membrane. Membranes were blocked in 5% defatted milk or 5% BSA 
in Tris-buffered saline containing 0.1% Tween-20 (TBST), probed with diluted 
antibodies and incubated at 4°C overnight. The following primary antibodies 
were used: rabbit polyclonal anti-HSP90 (1:1,000 dilution, Cell Signaling), rabbit 
polyclonal anti-pSTAT3 (Tyr705) (1:1,000 dilution, Cell Signaling), rat monoclo- 
nal anti-ROR‘t (5:1,000 dilution, clone AFKJS-9, eBioscence), rabbit polyclonal 
anti-IL-23R (H-300) (1:1,000 dilution, Santa Cruz). After washing in TBST, the 
membrane was incubated with secondary antibodies that were conjugated to horse- 
radish peroxidase (HRP) (1:5,000 dilution, Cell Signaling). The protein bands were 
visualized using the ECL Western Blotting Substrate (Pierce). 

Samples from human prostates. Samples were acquired from patients with 
mCRPC, who had given their written informed consent to institutional protocols 
approved either by the Royal Marsden NHS Foundation Trust Hospital (London, 
UK) Ethics Committee (reference no. 04/Q0801/60), the IRCCS Ospedale 
San Raffaele (Milan, Italy) Ethics Committee (reference no. 99/INT/2004; 58/ 
INT/2010) or the Azienda Ospedaliera di Padova (Padova, Italy) Ethics Committee 
(reference no. CESC/958P/2005). Human biological samples were sourced ethi- 
cally and their research use was in accord with the terms of the informed con- 
sent provided. Fifty-one patients with CRPC treated at The Royal Marsden NHS 
Foundation Trust Hospital with sufficient formalin-fixed, paraffin-embedded, 
had matching CSPC and CRPC biopsies identified for multiplex immunofluores- 
cence (see Supplementary Table 1). Four patients with CRPC, enrolled at Azienda 
Ospedaliera di Padova, and four patients with CSPC, enrolled at IRCCS Ospedale 
San Raffaele, were selected to perform the immune tumour microenvironment 
characterization by flow cytometry analyses. Case selection was blinded to baseline 


characteristics, treatments received, clinical outcome and molecular characteriza- 
tion to reduce any potential selection bias. Finally, plasma from 120 patients with 
CRPC with sufficient samples stored (including 28 plasma samples within 40 days 
of CRPC biopsy) and 20 patients with CSPC were analysed for IL-23 levels. 
Statistical analysis and reproducibility. Data analyses were carried out using 
GraphPad Prism version 7. The data are mean + s.e.m., individual values as scatter 
plots with column bar graphs and were analysed using Student's t-tests (paired or 
unpaired according to the experimental setting) by a two-sided test, and, when 
indicated, followed by Wilcoxon signed-rank test. One-way ANOVA was used 
to compare three or more groups in time point analyses. Differences were con- 
sidered significant when P < 0.05 and are indicated as not significant, *P < 0.05, 
**P < 0.01, ***P< 0.001. Non-parametric tests were applied when variables 
were not normally distributed using the SPSS statistical software. n values repre- 
sent biological replicates. Survival curves were compared using the log-rank test 
(Mantel-Cox). Because of evidence of overdispersion, tumour-infiltrating PMN- 
MDSC (CD15*CD11b*CD33*EpCAM_ ) and CD15~CD11b* cells were analysed 
using mixed-effect negative binomial regression model (with per patient random 
intercept) when comparing paired biopsies, and a negative binomial regression 
model was used when analysing the association between CRPC biopsies and IL-23. 
PMN-MDSC (coefficient, 1.49; 95% confidence interval, 0.83-2.15; P< 0.001); 
CD15~-CD11b* cells (coefficient, 0.43; 95% confidence interval, 0.04-0.83; not 
significant (P >0.05)). All statistics and reproducibility information are reported 
in the figure legends. For animal studies, sample size was defined on the basis of 
past experience with the models", to detect differences of 20% or greater between 
the groups (10% significance level and 80% power). For ethical reasons the min- 
imum number of animals necessary to achieve the scientific objectives was used. 
Animals were allocated randomly to each treatment group. Different treatment 
groups were processed identically and animals in different treatment groups were 
exposed to the same environment. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Source Data for the figures and extended data figures are pro- 
vided in the online version of the paper. CSPC and mCRPC tumour biopsy mRNA- 
seq data that support the findings of this study are available in the SU2C-PCF IDT 
cBioportal (http://www.cbioportal.org) and through dbGAP (https://www.ncbi. 
nlm.nih.gov/gap/) with the identifier phs000915.v1.p1”. 
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Extended Data Fig. 1 | Multispectral images of PMN-MDSCs in 
human biopsies and set-up of the different CRPC mouse models. 

a, Multispectral microscopy images (EpCAM, yellow; CD15, green; 
CD33, red; CD11b, pink) of castration-sensitive and castration-resistant 
prostate cancers. n = 3 biological independent patients. Scale bars, 

20 um. b, Quantification of the number of CD11b*CD15~ cells in the 
tumour of castration-sensitive and castration-resistant prostate cancers 
(CSPC, n=51; CRPC, n=51 biological independent patients). Cells 
were counterstained with the nuclear marker DAPI (blue). Statistical 
analyses (negative binomial regression model): P= 0.062. c, MRIs of 

one representative sham-operated (Sham) or surgically castrated (CTX) 
Pten’©~’— mouse of the three analysed at different time points. d, Waterfall 
plot depicting proportional change in tumour response for sham (n = 3) 
and CTX (n=3) Pten’°~’~ mice. e, Prostate PMN-MDSC frequencies 
determined by flow cytometry in sham (n= 3) and CTX (n= 3) PtenPct/+ 
mice (12 weeks after castration). Statistical analyses (two-sided unpaired 
Student's t-test): P= 0.85. f, Schematic representation of the experiment. 
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Six-week-old C57BL/6 males were challenged subcutaneously with 
TRAMP-CI cells. When tumours reached approximately 100 mm?, mice 
were sham-operated (sham, n= 9) or surgically castrated (CTX, n=5). 
g, Tumour PMN-MDSC frequencies were determined by flow cytometry 
during castration-sensitive and castration-resistant phases. Sham CSPC, 
n=5; CTX CSPC, n= 4; sham CRPC, n= 8; CTX CRPC n=6. 

h, Schematic representation of the experiment. Six-week-old FVB males 
were challenged subcutaneously with MyC-CaP cells. When tumours 
reached approximately 100 mm?, mice were sham-operated (sham, n = 3) 
or surgically castrated (CTX, n=3). i, Tumour PMN-MDSC frequencies 
were determined by flow cytometry during castration-sensitive and 
castration-resistant phases. Sham CSPC, n = 3; CTX CSPC, n=4; sham 
CRPC, n=4; CTX CRPC, n=3. b, d, e, g, i, Data are mean +s.e.m. 

d, g, i, Statistical analyses (unpaired two-sided Student's t-test): ns, not 
significant; *P < 0.05; **P < 0.01; ***P< 0.001. f, h, Statistical analyses 
(two-sided unpaired Student's t-test followed by Wilcoxon signed-rank 
test): *P< 0.05. 
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Extended Data Fig. 3 | Factors secreted by MDSCs promote insensitivity 
to ADT in androgen-dependent mouse and human prostate cancer 

cell lines and the CXCR2 antagonist impairs tumour recruitment 

of MDSCs in Pten’©—'~ mice. a, Representative dot plot reporting 

the BM-MDSCs after in vitro differentiation. Data were validated in 

two biological independent experiments. b, Experimental scheme. 

c, Cell proliferation of MyC-CaP cells (none, n = 5; conditioned medium 
(C.M.) from BM-MDSCs, n = 3; FAD, n= 13; FAD and BM-MDSCs, 

n= 13 biological independent samples). d, Percentage of annexin V~ and 
7AAD~ MyC-CaP cells. e, Percentage of annexin V* and 7AAD* MyC- 
CaP cells. f, GRT-PCR analyses of the indicated genes in MyC-CaP cells. 
g, h, i, Cell proliferation of VCaP (none, n = 4; conditioned medium from 
human BM-MDSCs, n =4; FAD, n =7; FAD and human BM-MDSCs, 

n=8 biological independent samples), 22Rv1 (none, n = 4; conditioned 
medium from human BM-MDSCs, n = 8; FAD, n = 8; FAD and human 
BM-MDSCs, n = 4 biological independent samples) and PC3 (none, n = 4; 
conditioned medium from human BM-MDSCs, n= 8; FAD, n= 4; FAD 
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and human BM-MDSCs, n = 8 biological independent samples) prostate 
cancer cells. j, Tumour MDSC frequencies determined by flow cytometry 
of prostate tumours of CTX Pten’°~/~ mice treated or not with CKCR2 
antagonist (aCXCR2) at completion of the study (12 weeks after CTX). 
CTX castration-sensitive, nm = 3; CTX and CXCR2 antagonist castration- 
sensitive, n = 3; CTX castration-resistant, n = 4; CTX and CXCR2 
antagonist castration-resistant, n = 7 biological independent mice. k, Cell 
proliferation of TRAMP-C1 cells after 72 h of treatment with CXCR2 
antagonist. l, (RT-PCR analyses of the indicated genes in TRAMP-C1 
cells after 24h of treatment (fold change compared to the FAD condition). 
k, 1, Aggregated data from three independent experiments are reported, 
fold change compared to the FAD condition. c-1, Data are mean + s.e.m. 
c-f, n= 3 biological independent samples. d, e, g-j, Statistical analyses 
(unpaired two-sided Student’s t-test): ns, not significant; *P < 0.05; 

**P< 0.01; ***P < 0.001. ¢, f, Statistical analyses (two-sided unpaired 
Student's t-test followed by Wilcoxon signed-rank test): *P < 0.05. 
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Extended Data Fig. 4 | Impaired tumour recruitment of MDSCs 
enhances response to surgical castration in different allograft models 
of prostate cancers. a, Schematic representation of the experiment. 
Six-week-old C57BL/6 males were challenged subcutaneously with 
TRAMP-CI cells. When tumours reached approximately 100 mm’, mice 
were surgically castrated and left untreated (CTX, n= 8) or treated with 
CXCR2 antagonist (CTX and CXCR2 antagonist, n = 9). Representative 
flow cytometry plots of PMN-MDSCs (CD11b*Ly6G? cells, gated on 
CD45* cells) in tumours for each experimental condition. b, RT-PCR 
analyses of the indicated genes in the prostate tumours after CTX or CTX 
and CXCR2 antagonist treatment (n = 3 per group). Data are mean + sem. 
Statistical analyses (unpaired two-sided Student's t-test): *P < 0.05; 
P< 0.001. c, Mean tumour volume (+s.e.m.) for each experimental 
group. Statistical analyses (unpaired two-sided Student's t-test followed by 
Wilcoxon signed-rank test): ***P < 0.001. d, Survival curves are reported 
in Kaplan-Meier plot. Statistical analyses (two-sided log-rank test): 

*** P< 0.001. e, Schematic representation of the experiment. Six-week-old 
FVB males were challenged subcutaneously with MyC-CaP cells. When 
tumours reached approximately 100 mm’, mice were surgically castrated 
and left untreated (CTX, n=5) or treated with CXCR2 antagonist (CTX 


ARTICLE 


and CXCR2 antagonist, n = 5). Representative flow cytometry plots of 
PMN-MDSCs (CD11b*Ly6G* cells, gated on CD45* cells) in tumours 
for each experimental condition. f, RT-PCR analyses of the indicated 
genes in the prostate tumours after CTX or CTX and CXCR2 antagonist 
treatment (n = 3 per group). Data are mean + s.e.m. Statistical analyses 
(unpaired two-sided Student's t-test): **P < 0.01; ***P < 0.001. g, Average 
tumour volume (+s.e.m.) for each experimental group. Statistical analyses 
(two-sided unpaired Student's t-test followed by Wilcoxon signed-rank 
test): *P < 0.05. h, Survival curves reported as Kaplan-Meier plot. 
Statistical analyses (two-sided log-rank test): **P < 0.01. i, Schematic 
representation of the experiment. Six-week-old NOD/SCID males were 
challenged subcutaneously with LNCaP cells or with LNCaP cells and 
human BM-MDSCs. When tumours reached approximately 70 mm’, 

mice were sham-operated (sham, n = 5) or sham-operated and injected 
every three days intraperitoneally with 3 x 10° human BM-MDSCs (sham 
and human BM-MDSCs, n=5) or surgically castrated and left untreated 
(CTX, n=8) or treated with human BM-MDSCs (CTX and human BM- 
MDSCs, n=5). j, Average tumour volume (+s.e.m.) for each experimental 
group. Statistical analyses (unpaired two-sided Student's t-test followed by 
Wilcoxon signed-rank test): **P < 0.01. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | IL-23 pathway is the most upregulated in the 
tumour after castration. a, Gene expression of selected genes determined 
by NanoString nCounter gene expression assay in sham Pten?’C~/~ and 
CTX Pten’°~’- tumours. Data are shown as pool of n=5. b, Analyses of 
the conditioned medium of bone marrow-derived MDSCs tested for the 
indicated soluble molecules by Mouse CytokineMAP B version 1.0. The 
graph shows the concentration of the indicated soluble molecules as logio 
of the concentration found in the conditioned medium of BM-MDSCs, 
the values were subtracted of the background (culture medium). Data are 
shown as pool of n= 10. c, (RT-PCR analyses of the indicated genes in 
sham (n= 6) and CTX (n=6) Pten’°~’~ tumours. Data are mean+s.e.m. 
of biological independent animals. Statistical analyses (unpaired two-sided 


ARTICLE 


Student’s t-test): *P < 0.05. d, Protein level of CKCL1, CXCL2 and CXCL5 
in CTX Pten’°~’~ tumours. Data are analysed as ratio between CTX (pool 
of three samples) and sham (pool of three samples) Pten’°~/~ tumours and 
reported as fold increase in protein level. e, f, IL-23R protein level analysed 
by flow cytometry and western blot on TRAMP-CI cells under normal 
culture conditions (FBS) or androgen-deprivation culture conditions 
(charcoal-stripped FBS). n = 4 biological independent samples per group. 
f, Numbers indicate fold change in protein level. Loading control: anti- 
8-actin antibody. The western blot was validated at least twice. g, Protein 
profile of the plasma of patients with CSPC and CRPC. Data are analysed 
as ratio between CRPC (pool of 18 samples) and CSPC (pool of 17 
samples) and reported as fold increase in protein level. 
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Extended Data Fig. 6 | Characterization of IL-23* cells in the tumour 
of CTX Pten’°~/~ mice and patients with CRPC. a, Multispectral 
microscopy images (EpCAM, yellow; CD15, green; IL-23, red) of 

three patients with CRPC. b, Correlation analyses of the numbers of 
CD15~CD11b* cells in the tumour and IL-23 levels in the plasma 

of patients of CRPC (n = 28). Statistical analyses (negative binomial 
regression model): P= 0.63. c, d, IL23A and JL23R mRNA expression in 
the tumour of CSPCs (1 = 549) and mCRPCs (n= 116). e, f, Expression 
of IL-23 in PMN-MDSC marker-positive (CD11b*CD33*CD15*) 
tumours from patients with CSPC or mCRPC. c-f, Statistical analyses 
(unpaired two-sided Student’s t-test) are reported. g, Representative plots 
of IL-23+, CD45* and CD45-, Ly6G>"8"'CD11bt and Ly6G™CD11b™, 
CD11b*F4/80* cells pregated on the reported population in the tumour 
of CTX Pten’©~/~ mice. IL-23 gate was decided based on isotype control 
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panel (insert). Pie chart shows the percentage of the different subsets gated 
on IL-23* cells in the tumour of Pten?©~’~ mice (mean, n= 9). h, qRT- 
PCR analyses of IL-23 in the prostate tumours of castrated (CTX; n =6) or 
castrated and treated with CXCR2 antagonist (CTX + CXCR2 antagonist; 
n=7) Pten’~/~ mice. Data are mean + s.e.m. i, PMN-MDSC and TAM 
frequencies determined by flow cytometry in the tumour of castrated 

NSG TRAMP-C1 allografts upon treatment with isotype, CSF1R antibody, 
CXCR2 antagonist. Data are mean + s.e.m. (m = 3 per group). j, (RT-PCR 
analyses of IL-23 in the tumour of castrated NSG TRAMP-C1 allografts 
upon treatment with isotype (n = 4), CSF1R antibody (n = 5), CKCR2 
antagonist (n = 5). Data are mean + s.e.m. Each dot represents a biological 
independent animal. h-j, Statistical analyses (unpaired two-sided Student’s 
t-test): *P < 0.05; **P< 0.01; ***P< 0.01. 
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Extended Data Fig. 9 | Genetic inhibition of IL-23 limits resistance to 
castration in prostate cancer in TRAMP-C1 allograft model in vivo. 

a, Schematic representation of the experiment. Six-week-old C57BL/6 males 
were lethally irradiated and transplanted with bone marrow precursors 
from 1123a“" and 1123a®° mice. After the bone marrow engraftment, 

the animals were challenged subcutaneously with TRAMP-C1 cells. 
When tumours reached approximately 100 mm’, mice were surgically 
castrated and monitored for tumour progression. b, Haematoxylin and 
eosin, Ki-67 and pSTAT3(Y705) immunohistochemical staining (Ki-67 
and pSTAT3(Y705), brown; nuclei, blue) of representative TRAMP-C1 
1123a? and TRAMP-C1 I]23a®° mice. Scale bars, 25 um. c, Quantification 
of Ki-67* cells is reported as a percentage of the total number of cells. 
TRAMP-C1 1/23a™" (n=8) and TRAMP-C1 I123a*° (n = 4), one tumour 
per mouse, mean of three sections per tumour, >3 fields per section. Data 
are mean + s.e.m. of biologically independent mice. Statistical analyses 
(unpaired two-sided Student’s t-test): **P < 0.01. d, e, PMN-MDSC 
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frequencies determined by flow cytometry in the tumour and in the 
spleen of TRAMP-C1 1123a™? (n=3) and TRAMP-C1 I]23a®° (n=3) 
mice 10 days after castration. Data are mean +s.e.m. f, Quantification 

of pSTAT3(Y705) reported as a percentage of the total number of cells. 
TRAMP-C1 1/23a™? (n=8) and TRAMP-C1 1123a*° (n =4), one tumour 
per mouse, mean of three sections per tumour, >3 fields per section. 
Statistical analyses (unpaired two-sided Student's t-test): ***P < 0.001. 

g, Western blot for RORy, pSTAT3(Y705) and total STAT levels in 
prostate tumours of TRAMP-C1 1/23a? and TRAMP-C1 1123a®° mice. 
Loading control: HSP90 antibody or total ERK antibody. The western blot 
was validated at least twice. h, Quantification is reported as mean +s.e.m. 
of biological independent experiments: TRAMP-C1 1/23a? ROR4, 
n=9;and TRAMP-C1 1123a*° RORY, n=9; TRAMP-C1 1123a"7 
pSTAT3(Y705), n= 4; and TRAMP-C1 1123a*° pSTAT3(Y705), n=3. 
Statistical analyses (unpaired two-sided Student's t-test): *P < 0.05. 
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PMN-MDSCs progressively infiltrate the tumour bed mainly recruited 

by CXCL5. Within the tumour, PMN-MDSCs start to produce higher 
amount of IL-23, thus establishing a positive-feedback loop that induces 
the overexpression of IL-23R on the tumour epithelial cells and confer 
resistance to castration in prostate cancer by activating the STAT3-RORY 
pathway. ENZA treatment can block the AR, inducing sensitiveness of 
prostate cancer cells to androgen deprivation, but the persistent presence 
of PMN-MDSC-derived IL-23 rescues the drug sensitiveness leading to 
ADT resistance. Anti-IL-23 treatment reinstates sensitivity to castration in 
prostate cancer enhancing the efficacy of ENZA. 
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Scaling up molecular pattern recognition with 
DNA-based winner-take-all neural networks 


Kevin M. Cherry! & Lulu Qian!?* 


From bacteria following simple chemical gradients! to the brain 
distinguishing complex odour information’, the ability to recognize 
molecular patterns is essential for biological organisms. This 
type of information-processing function has been implemented 
using DNA-based neural networks’, but has been limited to the 
recognition of a set of no more than four patterns, each composed 
of four distinct DNA molecules. Winner-take-all computation‘ has 
been suggested>” as a potential strategy for enhancing the capability 
of DNA-based neural networks. Compared to the linear-threshold 
circuits’ and Hopfield networks® used previously*, winner-take- 
all circuits are computationally more powerful’, allow simpler 
molecular implementation and are not constrained by the number 
of patterns and their complexity, so both a large number of simple 
patterns and a small number of complex patterns can be recognized. 
Here we report a systematic implementation of winner-take-all 
neural networks based on DNA-strand-displacement””” reactions. 
We use a previously developed seesaw DNA gate motif*!)!?, 
extended to include a simple and robust component that facilitates 
the cooperative hybridization’* that is involved in the process of 
selecting a ‘winner. We show that with this extended seesaw motif 
DNA-based neural networks can classify patterns into up to nine 
categories. Each of these patterns consists of 20 distinct DNA 
molecules chosen from the set of 100 that represents the 100 bits in 
10 X 10 patterns, with the 20 DNA molecules selected tracing one of 
the handwritten digits ‘1’ to ‘9. The network successfully classified 
test patterns with up to 30 of the 100 bits flipped relative to the digit 
patterns ‘remembered’ during training, suggesting that molecular 
circuits can robustly accomplish the sophisticated task of classifying 
highly complex and noisy information on the basis of similarity to 
a memory. 

Winner-take-all computation’ is one of the simplest competitive 
neural-network models, inspired by the lateral inhibition and com- 
petition observed among biological neurons in the brain'*. In this 
model, the output of a neuron is ON if and only if the weighted sum 
of all binary inputs is the largest among all neurons (Fig. 1a). Here, in 
a winner-take-all neural network, the weight matrix associated with 
each output is referred to as a ‘memory. As shown in Fig. 1b, a simple 
training algorithm involves using the target patterns as weights. The 
example network has two memories—in other words, it ‘remembers’ 
two patterns—'T and “T’ The network ‘recognizes’ a pattern by com- 
paring it to all memories and identifying which memory the pattern is 
most similar to—the output associated with this memory will be ON 
and all other outputs will be OFF. For instance, a corrupted ‘L with 
the last bit flipped from 1 to 0 can be recognized as ‘L; because it will 
result in y; (the output of the neuron remembering TL) being ON and y, 
(the output of the neuron remembering “T”) being OFE. 

The winner-take-all function can be broken into five subfunctions, 
each of which can be implemented with a simple chemical reaction 
(Fig. 1c): First, weight multiplication of x; x wi (where x; is a binary 
input and w, is an analogue weight) is implemented with reactions 
wherein an input species X; catalytically converts a weight species 
Wj to an intermediate product Py. If X; is absent, then no Pi will be 


produced; if X; is present, then the final concentration of Pj will be 
determined by the initial concentration of Wj, thus setting the value 
of the weighted input. Second, summation is implemented with reac- 
tions that convert all intermediate species Pj within the same neuron 
to a common weighted-sum species S;. Third, comparison of weighted 
sums to determine which is the largest is implemented with a set of 
‘pairwise annihilation reactions, wherein each weighted-sum species 
5; destroys any other weighted-sum species S; until only a single winner 
remains. Fourth, signal-restoration reactions bring the concentration 
of the winner species back to a predetermined output value—the final 
concentration of a winning output species Y; corresponds to the initial 
concentration of a restoration-gate species RGj. Last, reporting reac- 
tions are used to convert each output Yj to a fluorescent signal Fluor;. 

All reactions except pairwise annihilation and signal restoration 
naturally take place sequentially, because the product of a previous 
reaction is a reactant of the next one. Because there are common reac- 
tants in the annihilation and restoration reactions, we used different 
rates to control their order: the former has a much faster rate constant 
than the latter, so a winner that survives all fast competitions is then 
converted slowly to an output signal. 

Weight multiplication and signal restoration are both catalytic 
reactions, implemented with a pair of seesawing reactions! (Fig. le, 
Extended Data Fig. 1). An input X; (or weighted sum Sj) species first 
interacts with a weight Wj (or restoration gate RG;) species through 
a reversible strand-displacement reaction’ to release an intermediate 
product P; (or output Y;) species. A fuel strand XF; (or YF;) then frees 
the input (or weighted sum) species for more catalytic cycles. As long 
as the fuel strand is in excess, all weight (or restoration gate) molecules 
will eventually be converted to intermediate (or output) molecules. 
Summation is implemented with a single seesawing reaction facili- 
tated by a summation gate SG; (Extended Data Fig. 1). The reaction is 
reversible by itself but drained forward by the downstream irreversible 
reaction of pairwise annihilation. 

The annihilation reaction is implemented with cooperative hybrid- 
ization’? (Fig. 1f). One weighted-sum strand S; can bind to a toehold 
on one side of an annihilator molecule Anh; and branch-migrate to 
the middle point of the double-stranded domain. If only S$; is pres- 
ent, then this process is completely reversible and no molecules will 
be consumed. However, if another weighted-sum strand S; is also 
present, then it can bind to another toehold on the opposite side of 
the annihilator and also branch-migrate to the middle point of the 
double-stranded domain. When the S; and S; strands reach the middle 
point simultaneously, the annihilator will be split apart into two waste 
molecules. Because neither waste molecule has a toehold exposed, it 
cannot interact with any other molecules. The annihilation reaction 
shown in Fig. 1f is designed to be roughly 100 times faster than the 
signal-restoration reaction shown in Fig. le, owing to the two extra 
nucleotides in both toeholds on the annihilator—it is known that the 
rate of strand displacement reactions grows exponentially faster with 
a longer toehold’>’®. 

Reporting is implemented with an irreversible strand-displacement 
reaction, wherein an output strand Y; interacts with a double-stranded 
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Fig. 1 | Winner-take-all neural network and its DNA implementation. 
a, A winner-take-all (WTA) neural network with m memories that each 
has n bits; x; to x, and y, to y,, are binary inputs and outputs, respectively; 
w; (1 <i<nand 1 <j<m) are analogue weights of positive, real numbers; 
sj and sx (1 <j #k<™m) are weighted sums of the inputs. b, Example 
pattern recognition using target patterns as weights. Each 9-bit pattern is 
shown in a 3 x 3 grid. Each black or coloured pixel indicates a 1 and each 
white pixel indicates a 0. The two target patterns correspond to the letters 
‘L and ‘T; respectively. If the input pattern is corrupted (for example, the 
last bit of ‘L is flipped from 1 to 0, as indicated by the orange cross), then 
the neural network can still recognize it as being more similar to ‘L’ than 
to ‘T’, because the weighted sum using ‘L as weights is still larger than 

the weighted sum using “T’ as weights. c, Chemical-reaction-network 
implementation. The concentrations of chemical species X;, Wj, Sj and 

Yj correspond to the values of variables x;, Wijp $j and Vj respectively. The 
species in black are needed as part of the function, whereas the species 

in grey are needed to facilitate the reactions. The waste molecules are not 
shown in the reactions. k;and k, are the rate constants of the pairwise- 
annihilation and signal-restoration reactions, respectively. d, DNA-strand- 
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displacement implementation. The initial test tube (left) shows all DNA 
species with 1 <i<nand1<j#k<m. The final test tube (right) shows 
only the product species after a set of input strands are added, with i, j 
and k being a subset of all possible numbers depending on the specific 
input. Zigzag lines indicate short (5 or 7 nucleotide) toehold domains 
and straight lines indicate long (15 or 20 nucleotide) branch-migration 
domains in DNA strands, with arrowheads marking their 3’ ends. Each 
domain is labelled with a name and assigned a unique DNA sequence, 
with asterisks in the names indicating sequence complementarity. Strand 
modifications are labelled as F and Q, where F indicates a fluorophore 
and Q indicates a quencher. e, Signal-restoration reaction. The grey circle 
with an arrow indicates the direction of the catalytic cycle. f, Pairwise- 
annihilation reaction. Representative (not all possible) states are shown. 
In e and f, arrows with black-filled and white-filled arrowheads indicate 
the forwards and backwards directions of a reaction step, respectively. 
The mechanisms of weight multiplication, summation and reporting 
reactions are shown in Extended Data Fig. 1. DNA sequences are listed in 
Supplementary Table 1. 
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Fig. 2 | Experimental characterization of winner-take-all DNA neural 
networks. a, Two-species winner-take-all behaviour. The standard 
concentration is 50 nM (1x). The circuit is composed of two weighted- 
sum strands (S; and S3), an annihilator molecule ([Anh;,.] =75 nM 
(1.5x)), two restoration gates ([RG;] = [RG] =50 nM (1x)), two 

fuel strands ([YF,] = [YF2] = 100 nM (2x)) and two reporters 

({Rep,] = [Rep2] = 100 nM (2x)). Initial concentrations of S, and S, 

are shown as fractions of the standard concentration. The diagonal line 
indicates equal concentrations of both strands. Fluorescence kinetics data 
are shown over the course of 2.5 h, normalized using a common minimum 
and maximum fluorescence level (Methods section ‘Data normalization’). 
To clearly illustrate the difference between the two output trajectories, 
the background below the data points are shown in the same colour (with 
some transparency) as the data points. b, A 4-bit pattern-recognition 
circuit. In the weighted-sum layer of the circuit diagram (top left), each 
wire corresponds to a weight molecule, all wires from the same input 
require a common fuel strand and all wires to the same output require 

a common summation gate. Thus, a circuit that can remember any two 
4-bit patterns is composed of 25 molecules (4 inputs, 14 molecules in 

the weighted-sum layer and 7 molecules in the winner-take-all layer). 


reporter molecule Rep; (Extended Data Fig. 1) to separate the 
fluorophore- and quencher-labelled strands in the reporter, resulting 
in increased fluorescence. Overall, the implementation of an arbitrary 
winner-take-all neural network can be mapped systematically to a see- 
saw DNA circuit (Extended Data Fig. 2). 

We started the experimental demonstration with a two-species 
winner-take-all function (Fig. 2a), which is similar to approximate 
majority’” and consensus network’ functions. If the initial concen- 
tration of one weighted-sum species (S, or S) is higher than that of 
the other, then we expect the corresponding output strand (Y, or Y>) 
to be released catalytically and the fluorescent signal to reach an ideal 
ON state, while the other output signal remains at an ideal OFF state. 
The data agree with the expected overall circuit behaviour, and lead to 
two main observations. First, the circuit computed an ON state faster 
with a larger difference between the two species, as shown in the plots 
farther away from the diagonal line in Fig. 2a. This is because the signal- 
restoration reaction reaches completion faster with a larger amount of 
catalyst, which is the leftover amount of the winner after the annihi- 
lation reaction. Second, among experiments for which the differences 
between the two species are the same, the circuit maintained a cleaner 
OFF state with lower initial concentrations of the two species, as shown 
in the plots that are equidistant to the diagonal line but closer to the 
bottom left corner of the grid. This is because a small fraction of the 
weighted-sum strands will interact with a restoration-gate molecule 
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Weighted sum, s, 


Time (h) 


However, a circuit that remembers two specific 4-bit patterns requires 
only a subset of the wires in the weighted-sum layer, each corresponding 
toa 1 in the memories (for example, each orange wire in the circuit 
diagram corresponds to a black pixel in the memories). Thus, the example 
circuit is composed of 20 molecules (4 fewer weight molecules and 1 
fewer fuel strand). In each output-trajectory plot (right), dotted lines 
indicate fluorescence kinetics data and solid lines indicate simulations. 
The patterns to the left and right of the arrows indicate input signals and 
output classifications, respectively. Each orange cross indicates a bit-flip 
compared to the memories. The initial concentration of each input strand 
or weight molecule is either 0 or 50 nM; weight fuels (XF and XF2) are 
twice the concentration of weight molecules; the initial concentrations of 
the summation gates, annihilator, restoration gates, restoration fuels and 
reporters are 100 nM (1x), 400 nM (4x), 100 nM (1x), 200 nM (2x) 

and 200 nM (2x), respectively, with a standard concentration of 100 nM 
(details in Supplementary Table 3). In the weighted-sum space (bottom 
left), the two patterns with two corrupted bits are the same distance 
(shown as double-headed arrows) from the diagonal line as the two 
perfect inputs. 


before encountering an annihilator molecule—the stronger the 
runner-up is (that is, with a higher concentration), the more it can 
escape the process of being completely annihilated. These observations 
suggest that the DNA circuit does not yield a perfect winner-take-all 
behaviour, but that it does compute correctly for competitors that are 
not too similar to each other and are not both too strong. 

Next, we added a weighted-sum layer to the winner-take-all circuit 
to demonstrate recognition of 4-bit patterns (Fig. 2b). Using the two 
target patterns as weights, the perfect input patterns each triggered 
the desired output trajectory to turn ON, indicating that the inputs 
were recognized correctly. When one or two bits of the input patterns 
were flipped, either from a 1 to a 0 or vice versa, the circuit still yielded 
the desired output for all six examples that are classifiable. The other 
eight possible inputs are not classifiable because they result in equal 
weighted sums (s; =s2). Interestingly, the circuit behaviour was better 
for the inputs with 2-bit corruptions than for the perfect inputs: the ON 
trajectories reached completion just as fast and the OFF trajectories 
remained lower. This result can be understood by looking at the input 
patterns in the weighted-sum space (Fig. 2b, bottom left): all four inputs 
are equidistant to the diagonal line and the corrupted patterns are closer 
to the bottom left corner of the space. Because catalytic reactions are 
used to implement weight multiplication, together with thresholding 
reactions, the circuit can also handle a range of input concentration that 
varies from the ideal high or low concentration (Extended Data Fig. 3). 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a b Ss 
[Wg 4] = 3.6 nM ot 
ee : cecal Xj] = 5M 
= - - Xe MN a 
ot a oSenae so ees | x 
A ne pce : x “ aS 
N = 100 | | | [XFg5] = 7.2 nM x K 
Xa5 T x XQ 
P| | Wigsl =6.2nM 5 
5 X,,] = 5 nM 
ses P, Bd) Pan 
a = NN Xe T 7 
3 OF — a omg 2 te 
16 
IN = 100) _ \ [XF,,] = 12.4 nM 
Xie T 
| 
0 0.02 0.04 0.06 
ce Number of d 13,936 classifiable 36 tested 
WTA species handwritten digits handwritten digits 
xy Input, X, b|n = 20|100 1.0 ; 
Xo e 
4 Weight, W; bxm=40 08 
%3 Fuel, XF, b<35<n s 
Yo Summation gate, SG, m=2 5 0.6 
oie xe} 
X00 Annihilator, Anhix ™C,=1 2 0.4 
Restoration gate, RG; m=2 g 02 
n, number of bits in a pattern Fuel, YF; m=2 “= = 
i ; m=2 0.0 SPREE 2 OS 
p oi a Ne — Reporte Hep; 0.0 02 04 06 08 1.000 02 04 06 08 1.0 
. Total 104/184 Weighted sum, s, Weighted sum, s, 
e Memories 
5 3 
Wy a S 
= (e) 
an, 
Yo = 
3 
£ 
«ee Experiment 6 
— Simulation 
5 
Qa 
5 
fe) 
5 
2. 
5 
fe) 


14-17 


Time (h) 


18-21 


Deviation from the memory (bits) 


Fig. 3 | A winner-take-all DNA neural network that recognizes 

100-bit patterns as one of two handwritten digits. a, Weights determined 
as the average of 100 ‘6’s and ‘7’s from the MNIST database. The value 

of each pixel (for example, 0.036 for the 35th pixel in ‘6’ and 0.062 for 

the 16th pixel in ‘7’) was used to determine the concentration of each 
weight molecule, relative to a standard concentration of 100 nM (for 
example, [W351] =3.6 nM and [Wj.2] =6.2 nM). The concentrations of 
the fuel strands that facilitate the weight multiplication reactions were 
twice that of their respective weight molecules. b, Example binary inputs 
with each 1 and 0 corresponding to the presence and absence of an input 
strand, respectively. The concentration of each input strand present was 
1/b x 100 nM =5 nM, where b = 20 is the total number of 1s in each input. 
The orange crosses indicate bit-flips compared to the memories (that is, 
weight matrices) shown in a. There are 12 flipped bits in each example. 
Because the total number of 1s in each input pattern is the same as the total 
number of non-zero weights in the memories, it is always the case that half 
of the flipped bits are associated with non-zero weights. c, Circuit diagram 
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and the number of distinct species in the circuit. For the total number of 
species, the two values correspond to the number of species for a specific 
number b of inputs (left) and for all n possible inputs (right). d, The 

13,936 classifiable digits (left; large green and yellow points) correspond 

to 98% of all “6’s and ‘7’s in the MNIST database. Test input patterns were 
chosen (right; large green and yellow points; 36 in total) on the basis of 
their locations in the weighted-sum space. The lines labelled s; =s) + 0.15 
indicate a 15% margin to the diagonal line, within which we expect the 
pattern recognition to be experimentally difficult. The light grey points 
correspond to non-classifiable (left) or non-tested (right) digits. 

e, Recognizing handwritten digits with up to 30 flipped bits compared to 
the ‘remembered’ digits. Dotted lines indicate fluorescence kinetics data 
and solid lines indicate simulations. The input pattern is shown in each 
plot. Note that 40 is the maximum number of flipped bits because all 
patterns have exactly 20 1s. Weights and inputs are listed in Supplementary 
Table 2. The initial concentrations of all species are listed in Extended Data 
Fig. 10 (details in Supplementary Table 3). 
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To understand the theoretical limits of the scalability and power of 
winner-take-all DNA neural networks, in the context of simply using 
the target patterns as weights, we now address the following three ques- 
tions. The first is the number of distinct target patterns that can be 
remembered simultaneously. Any set of patterns that consists of the 
same number of 1s can be remembered (Methods, Theorem 1). For 
example, the largest set of 9-bit patterns that can be remembered, each 
consisting of five 1s, consists of °C; = 126 patterns. Moreover, any set 
of patterns can be remembered if it does not contain a pattern in which 
all 1s are a subset of 1s in another pattern (Methods, Theorem 2). The 
second question concerns which corrupted patterns can be recognized. 
All patterns with fewer than b — o corrupted bits can be recognized, 
where b is the total number of 1s and o is the maximum number of 
overlapped 1s in all target patterns (Methods, Theorem 3). For example, 
all patterns with fewer than three corrupted bits can be recognized for 
the 9-bit target patterns ‘L and “T’ shown in Fig. 1b, because b=5 and 
o=2. Moreover, some patterns with more than b — o corrupted bits can 
still be recognized; for example, in all possible 9-bit patterns, there are 
128, 102 and 30 patterns with three, four and five corrupted bits, respec- 
tively, that can be recognized as ‘L or “T°. We chose 28 example 9-bit 
patterns with an increasing number of corrupted bits from one to five, 
and demonstrated that the DNA neural network correctly classified all 
examples (Extended Data Fig. 4). The final question asks how the size 
of the DNA circuit scales with an increasing number of more complex 
patterns. In general, constructing a network that can remember m dis- 
tinct n-bit patterns requires n input strands, n x m weight molecules 
and n fuel strands for weight multiplication, m summation gates, C, 
annihilators, m gates and m fuel strands for signal restoration, and m 
reporters, totalling n x m+ 2n + 4m + C, molecules. However, for a 
specific set of target patterns, only a subset of the weight molecules are 
required, each corresponding to a 1 in the patterns. 

To demonstrate the scalability and power of winner-take-all DNA 
neural networks experimentally, we chose a task that is visually inter- 
esting: recognizing handwritten digits. Some aspects of this task are 
computationally non-trivial, such as distinguishing a sloppy ‘4’ from a 
sloppy ‘9° The patterns of digits were taken from the Modified National 
Institute of Standards and Technology (MNIST) database’?, which is 
commonly used to test machine learning algorithms”°. We converted 
the original patterns to binary patterns with 20 1s on a 10 x 10 grid, 
averaged 100 example ‘6’ and ‘7’ patterns, and selected and normalized 
the top 20 pixels as weights (Fig. 3a, Methods section “Neural network 
training and testing’). The value of each analogue weight was then 
implemented with the concentration of a weight molecule. The test 
inputs remained binary patterns, in which each 1 or 0 corresponded to 
the presence or absence of an input strand, respectively (Fig. 3b). The 
theoretical limits of the winner-take-all neural networks with analogue 
weights are similar to those with binary weights (Methods, Theorems 
4 and 5). In total, 104 distinct molecules were used for testing any 
specific input pattern out of 184 distinct molecules for all possible 
inputs (Fig. 3c). 

In the MNIST database, there are more than 14,000 example hand- 
written ‘6 and ‘7’ digits. On the basis of the understanding that we have 
established from the experimental characterization of smaller winner- 
take-all circuits, we looked at all example patterns in the weighted-sum 
space (Fig. 3d, Extended Data Fig. 5a): 2% of the patterns are on the 
wrong side of the diagonal line, which means that it is impossible for the 
DNA circuit to recognize them correctly; 8% of the patterns are fairly 
close to the diagonal line (within a 15% margin), which we expect to 
be experimentally difficult; however, the remaining 90% of the patterns 
are far enough from the diagonal line that we expect correct recogni- 
tion. Therefore, we chose 36 representative example patterns from the 
last category, ensuring both uniform distribution in the weighted-sum 
space and the full range of bit deviation from the memories (Methods 
section ‘Neural network training and testing’). As shown in the exper- 
imental data (Fig. 3e, Extended Data Fig. 5d), the perfect patterns 
(the weights converted to binary) each yield the desired circuit output. 
More importantly, patterns that increasingly deviate from the memories 
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were also recognized, with up to 30 flipped bits. Similar to observations 
in the smaller DNA neural networks, some of the patterns that are 
visually more challenging to recognize are not necessarily more diffi- 
cult for the DNA circuit—a desirable property of the winner-take-all 
computation. 

We have shown that the winner-take-all DNA neural networks scale 
well to more complex patterns. Next, we explore whether they could 
also be used to remember an increasing number of distinct patterns 
simultaneously. The pairwise-annihilation approach alone is not well 
suited for scaling up the number of patterns because the number of 
annihilators grows quadratically with the number of patterns. We show 
that the three-species winner-take-all function was still robust enough 
(Extended Data Fig. 6a) to allow the construction of a DNA neural 
network that remembers three 100-bit patterns. However, the compe- 
tition became harder with more competitors: the reaction rates for 
multiple annihilation pathways could be matched approximately but 
not perfectly (Methods section “Sequence design, Extended Data 
Fig. 6b, c), and it took much longer for the annihilation reactions to 
yield a winner and for the signal level of the winner to be fully restored 
(Extended Data Fig. 7). Using the same method, it would be difficult 
to construct networks that remember more patterns. We therefore pro- 
pose an alternative approach that first divides the target patterns into 
groups and then uses multiple distinct group identities to classify the 
patterns (Fig. 4a). The nine digits ‘1’-“9’ can be divided into three 
groups in two ways (shown as three rows and three columns in Fig. 4b), 
such that a pair of outputs corresponds uniquely to each digit (Fig. 4d). 
For example, a ‘4’ is recognized if and only if yj = 1 and z; =1 (where y, 
is the output identifying the first row and z, is the output identifying 
the first column). With this grouping approach, nine distinct patterns 
can be recognized using only “°C, x 2=6 annihilators, which would 
otherwise require °C, =36 annihilators. In total, 225 distinct molecules 
were used for testing any specific input pattern out of 305 distinct 
molecules for all possible inputs (Fig. 4c). 

We determined the weights for each group using a simple ‘average 
then subtract’ method (Fig. 4b): take the average of 100 examples per 
in-group digit, subtract the average of 100 examples per out-of-group 
digit, then select and normalize the top 20 pixels (Methods section 
‘Neural network training and testing’). The trade-off of the grouping 
approach is that fewer example patterns can be recognized. With the 
best grouping, 47% of the patterns can potentially be recognized, of 
which 48% are experimentally feasible (with a 15% margin to the 
diagonal line in the normalized weighted-sum space). In general, 
with the same circuit complexity, this alternative approach enables 
a larger set of distinct target patterns to be classified, but with less 
accuracy. Nonetheless, as shown in the experimental data, the 
circuit yields the desired pair of outputs for 99 representative example 
patterns (Fig. 4d, e). 

To facilitate the design of winner-take-all DNA neural networks, we 
developed an online software tool. The WTA Compiler”! (Extended 
Data Fig. 8) converts a user-defined set of memories and test patterns 
into program code that describes a DNA neural network, which can 
then be used to simulate the kinetics of the network. It also provides 
sequences of the DNA strands that are required to construct the DNA 
neural network experimentally. 

It is interesting to compare the performance of winner-take-all neural 
networks with logic circuits. For example, it is possible to distinguish 
whether a 9-bit pattern is more similar to ‘L or “T’ using a circuit con- 
sisting of 8 logic gates, for all input patterns that we have tested experi- 
mentally. However, a more complex circuit consisting of 21 logic gates 
is required to correctly compute the output for all classifiable patterns 
(Extended Data Fig. 9a). Similarly, the 100-bit handwritten digits can 
be recognized by circuits with up to 23 logic gates, if only the example 
patterns that we have tested experimentally are considered. But these 
logic circuits perform poorly when tested against the entire MNIST 
database (Extended Data Fig. 9b). To match the theoretical limit of 
winner-take-all neural networks, measured by the percentage of classifi- 
able patterns, much more complex logic circuits are needed. Importantly, 
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Fig. 4 | A winner-take-all DNA neural network that recognizes 100-bit 
patterns as one of nine handwritten digits. a, Circuit diagram for 
recognizing nine distinct patterns using a grouping approach. WTA1 

and WTA2 are two separate winner-take-all functions that each yields a 
distinct set of outputs (yj and zj). b, Weights determined using an ‘average 
then subtract’ method. The average of 100 example digits from the MNIST 
database are shown grouped by rows (corresponding to outputs y;) and 
columns (corresponding to outputs z;). The weight matrix for each group 
(boxed patterns; colours correspond to the respective output trajectories 
in d) is the average of all in-group digits less the average of all out-of-group 
digits. Using this weight matrix, the fraction of experimentally feasible test 
patterns from all examples in the MNIST database was calculated for all 
possible ways of grouping the nine digits and the best grouping was chosen 


varying the concentrations of the weight molecules in the winner-take- 
all neural networks would enable the same set of DNA molecules to 
be used for different pattern-classification tasks. By contrast, without 
reconfigurable circuit architectures, a different set of DNA molecules 
would be required for a logic circuit that performs a different task. 

The power of winner-take-all DNA neural networks could be explored 
further in several directions. Instead of the pairwise-annihilation 
approach, a winner could be selected by utilizing competing 
resources”®, which could potentially lead to more scalable and 
accurate pattern recognition. It could also provide the possibility of 
selecting several winners instead of just one, which in theory is com- 
putationally more powerful’. Extending the circuit construction from 
single-layer to multi-layer winner-take-all computation, or simply 
allowing the outputs of winner-take-all circuits to be connected to 
downstream logic circuits, could enable more sophisticated pattern 


and shown here. c, Number of distinct species in the circuit in a. For 

the total number of species, the two values correspond to the number of 
species for a specific number b of inputs (Left) and for all n possible inputs 
(right). d, Fluorescence kinetics data (dotted lines) and simulations (solid 
lines) of the circuit behaviour with nine representative input patterns 
(shown in the plots). e, Fluorescence level of each pair of outputs at 24 h or 
longer after the inputs were added, collected from 99 experiments with 11 
example patterns per digit. Each coloured point corresponds to an example 
pattern from the labelled class of digit; each grey point corresponds 

to an out-of-class example pattern. Weights and inputs are listed in 
Supplementary Table 2. The initial concentrations of all species are listed 
in Extended Data Fig. 10 (details in Supplementary Table 3). 


recognition (such as involving translated and rotated patterns)”. 
Using a variable-gain amplifier?**, winner-take-all DNA circuits 
could be adapted to process analogue inputs, which would enable 
a wider range of signal-classification tasks, including applications 
in detecting complex disease profiles that consist of mRNA and 
microRNA signals. With aptamers”*”°, more diverse biomolecules 
could be detected. 

The fact that we were able to use target patterns as weights in 
winner-take-all DNA neural networks opens up immediate possibilities 
for embedding learning within autonomous molecular systems. With 
one additional circuit component that actives weight molecules during 
a supervised training process, the DNA circuits would be capable of 
activating a specific set of wires in the weight-multiplication layer when 
exposed to a specific set of patterns. As widely discussed in experimental” 
and theoretical?**° studies, learning—the most desirable property 
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of biochemical circuits—would allow artificial molecular machines 
to adapt their functions on the basis of environmental signals during 
autonomous operations. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0289-6. 
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METHODS 


Sequence design. All DNA strands used in the winner-take-all neural networks 
were composed of long branch-migration domains and short toehold domains. 
Owing to the modularity of the previously developed seesaw DNA motif"! and the 
extended new circuit component—the annihilator—the sequence design was per- 
formed at the domain level. A pool of domain sequences was generated according 
to a set of design heuristics that have previously been experimentally validated!”. 
All domains used a three-letter code (A, T and C) to reduce secondary structure 
and undesired strand interactions. No domain sequences include runs of more 
than four consecutive As or Ts or more than three consecutive Cs, which reduces 
synthesis errors. All domain sequences had between 30% and 70% C-content so 
all double-stranded complexes would have similar melting temperatures. Finally, 
no pairs of domain sequences share a matching sequence longer than 35% of the 
domain length, and all pairs have at least 30% different nucleotides. This ensures 
that a strand with a mismatched branch-migration domain will not complete 
strand displacement initiated from either the 3’ or the 5’ end. In addition to a 
15-nucleotide sequence pool used in previous work*!"!7, a 20-nucleotide sequence 
pool was generated and used in the weight multiplication layers because of the large 
number of molecules used here. The two sequence pools were checked to ensure 
that the same pairwise criteria were met. All domains included the clamp design 
introduced previously!!, to reduce leak reactions between initial gate species. 

All molecular complexes shared a 5-nucleotide universal toehold domain*!4””. 
The annihilator complexes had 7-nucleotide toeholds composed of the 5-nucleotide 
universal toehold and a 2-nucleotide extension that matched the 2 nucleotides 
adjacent to the toehold on the upstream seesaw gate. This increased the binding 
energy and thus the effective strand-displacement reaction rate between the anni- 
hilator complexes and the weighted-sum strands, compared to that between the 
signal-restoration gates and the weighted-sum strands. 

To ensure ‘fair competition’ between the weighted-sum species (that is, same 
rates for all pairwise-annihilation reactions), all annihilators within a set of 
winner-take-all computations had identical toehold extensions, and the weighted- 
sum strands had the same single-nucleotide dangle to keep the binding energies 
consistent within a winner-take-all computation. Here, we used up to two sets of 
three annihilators. The extensions and dangle sequences were chosen by estimating 
the binding energies using NUPACK*, and the sequences for the second set of 
annihilators were chosen with similar energies to those of the first set that worked 
well in the three-species winner-take-all experiments (Extended Data Fig. 6a). 
In addition, the rate of an annihilation reaction could depend on the sequence of 
the branch-migration domains. We measured the rates of 15 catalytic gates, and 
selected two groups of three gates with the closest rates (Extended Data Fig. 6b, c). 
By using these gates for signal restoration, the branch-migration domains in the 
annihilators were determined simultaneously, because the signal-restoration gates 
and annihilators share the same branch-migration domains (Extended Data Fig. 1). 

All DNA sequences are listed in Supplementary Table 1. 

Neural-network training and testing. The winner-take-all DNA neural network 
was tested on patterns derived from the MNIST handwritten-digit database'®. The 
training and testing sets were downloaded and merged into a single database, and 
all example patterns of digits ‘1’-‘9’ were retained, totalling 63,097 images. The 
original MNIST dataset consists of weight-centred grey-scale images on a 28 x 28 
grid. Here, we used binary patterns on a 10 x 10 grid. First, the images were res- 
caled to a 12 x 12 grid using Gaussian resampling. The largest 20 bits in each image 
were set to 1 and the remaining bits were set to 0. Finally, the digits were re-centred 
ona 10 x 10 grid on the basis of their bounding boxes. 

We made a conscious effort to train the neural networks using a simple algo- 
rithm. In the neural networks that remember two or three handwritten digits, for 
each digit, the weight matrices were the average of the first 100 example patterns 
in the database, restricted to the 20 most common bits (that is, the ones with the 
largest averaged values), and normalized to sum to 1. For the nine-digit network, 
all digits were divided into three groups in two ways. For each group, the weight 
matrix was the average of the first 100 examples of the three in-group digits less 
the average of the first 100 examples of the six out-of-group digits. The 20 most 
common bits were retained, and all weight matrices were normalized to sum to 
1.15, to shift the test patterns into a more ideal area in the weighted-sum space. 
The fraction of experimentally feasible test patterns (with a 15% margin to the 
diagonal line in the weighted-sum space for all pairs of species) was calculated for 
all ways of grouping the nine digits, and the best grouping was chosen. The clas- 
sification performance of the network using weights determined by non-negative 
least squares was only slightly better than the performance using weights from the 
simple ‘average then subtract’ method (54% versus 47%). 

Experimentally tested input patterns were chosen to represent the whole weighted- 
sum space as well as the full range of bit deviation from the memories of the 
networks. To choose a set of test patterns for a digit, all correctly classified examples 
of that digit with at least a 15% margin in the weighted-sum space were divided 
into six corruption classes. The weighted sums for the digits in each class were then 
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clustered using the k-medoids algorithm, and an example test pattern was chosen 
randomly from each cluster according to a uniform distribution. This ensured 
that the test patterns represented the whole weighted-sum space and not just the 
most common digits. 

Weights and inputs used in all experiments are listed in Supplementary Table 2. 
By exporting each sheet of the Excel file to a .csv file and uploading it to the WTA 
Compiler”!, the weights and inputs can be visually displayed, the inputs analysed 
in their weighted-sum space, the kinetics behaviour of the winner-take-all DNA 
neural network simulated and DNA sequences generated. 

DNA oligonucleotide synthesis. All DNA strands were purchased from Integrated 
DNA Technologies (IDT). The reporter strands with fluorophores and quenchers 
were purified (HPLC) and the other strands were unpurified (standard desalting). 
All strands were shipped lyophilized then resuspended at 100 1M in Tris-EDTA 
(TE) buffer, pH 8.0, and stored at 4°C. 

Annealing protocol and buffer condition. Annihilator and gate complexes 
were prepared for annealing at 45 |1M with top and bottom strands in a 1:1 ratio. 
Reporters were prepared at 20 |1M with top quencher strands in 20% excess of 
bottom strands. The buffer for all experiments and annealed complexes was TE 
with 12.5 mM Mg?*. Complexes were annealed in a thermal cycler (Eppendorf) 
by heating to 90°C for 5 min and then cooling to 20°C at a rate of 0.1°C per 6s. 
Purification. Annealed annihilator and gate complexes were purified using 12% 
polyacrylamide gel electrophoresis (PAGE). Double-stranded complex bands were 
cut from the gel, chopped into pieces and incubated for 24 h at room temperature in 
TE buffer with 12.5 mM Mg?" to allow DNA to diffuse into the buffer. The solution 
with purified complexes was recovered and concentrations were determined with 
NanoDrop (Thermo Fisher). Weight matrices for the DNA neural networks that 
remember handwritten digits had 20 gate complexes for each neuron. These gates 
(weight molecules) were annealed individually and then mixed together in the 
appropriate ratio, on the basis of the values of the weights. This mixture was then 
purified via PAGE, recovered and the concentration determined by NanoDrop 
using the weighted-average extinction coefficient. 

Fluorescence spectroscopy. Fluorescence kinetics data were collected every 2, 3 
or 4 min, depending on the overall length of the experiment, using a microplate 
reader (Synergy H1, Biotek). Excitation (emission) wavelengths were 496 nm 
(525 nm) for dye ATTO488, 555 nm (582 nm) for dye ATTO550 and 598 nm 
(629 nm) for dye ATTO590. Experiments were performed in 96-well plates 
(Corning) with 160-1] reaction mixture per well for the nine-digit experiments 
and 200-1 reaction mixture per well for all other experiments. Experiments were 
performed at a standard concentration of 100 nM for all 4-bit and 100-bit pattern 
recognition and at a standard concentration of 50 nM for all other experiments. 
Initial concentrations of all species are listed in Extended Data Fig. 10. Detailed 
protocols for all experiments are listed in Supplementary Table 3. 

In the nine-digit experiments, six distinct output trajectories were read using 
three distinct fluorophores. Every experiment was run twice, each having half 
of the outputs connected to fluorophore-labelled reporters and the other half to 
non-fluorophore-labelled reporters. Combining the output trajectories from each 
pair of experiments into a single plot allows the observation of all six outputs 
simultaneously. 

Data normalization. All data were normalized from raw fluorescence level to stand- 
ard concentration, which is the maximum concentration of an output strand Y; 
released from gate RG; and interacted with a double-stranded reporter molecule 
Rep;. The fluorescence level that corresponds to standard concentration (1) was 
obtained from the average of the final five measurements from the highest signal 
produced from gate RG; on a plate. Negligible concentration (0 x) corresponds to the 
background fluorescence of the reaction mixture before any reporter molecules have 
been triggered, which was obtained from the first measurement of the lowest signal 
produced from gate RG; on a plate. All experiments on a single plate were normalized 
together, allowing direct comparison between the output of a network for different 
input patterns. In the two-species winner-take-all experiments shown in Extended 
Data Fig. 3, the first six columns of data were measured on one plate and the last five 
columns measured on another. In the 9-bit pattern-recognition experiments shown 
in Extended Data Fig. 4, the input patterns with 0-2 corrupted bits were measured 
on one plate and those with 3-5 corrupted bits were measured on another. 

Model and simulations. Mass-action simulation were performed using the same 
set of reactions and rate constants developed in the seesaw model"!, with four 
additional reactions to model pairwise annihilation: 


kg 
S;+ Anh a S;Anh 4, 


k 
S,+ Anh, = SAnhy, 


k 
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Here, ks=2 x 10° M~'!s~!, which is the same as the forward rate constant of the 
thresholding reaction in the seesaw model'’. The reverse rate constant k,=0.4 s! 
was determined using the experimental data shown in Extended Data Fig. 3a. 
This rate constant is of the same order as found in a previous study of cooperative 
hybridization, Similar to the spurious reactions in the original seesaw model, 
temporary toehold binding between any single-stranded species and any anni- 
hilator (or intermediate annihilator species listed above) are also included here. 
Code availability. Simulation code is available at the WTA Compiler website". 
Theoretical limits of the power of winner-take-all neural networks. The winner- 
take-all function shown in Fig. 1a is defined to have: 


Inputs = (Xjs Xqs.000s% pq) 

Weights W=(w",02,...,%,), with Wi = (Wij Wai +s Wai) 
Weighted sums S= Wx 

Outputs p= 1 ifs >s, V kej 


‘(0 otherwise 
Definition 1. Let X = {x!, x’, ..., x"} bea set of m patterns, each with n bits. Let an 
example pattern from X be x° = (x,",x3',--- ,x,,), with x;" € {0, 1}. We say that a 
winner-take-all neural network with weights W remembers X if y, = 1 for all 
1<a<m (and yj=0 for all j + a) when x=x°*. 
Theorem 1. If X is a set of m distinct n-bit patterns, each containing exactly b 1s, 
then the winner-take-all neural network with W=(w, ,w,,...,.w,) and 
Wi= (WI), Wop ---) Wrj) =x (that is, Wy = x) remembers X. 
Proof. Consider this network on mou x=x". First, for j= «a, we calculate 
Sq =x" -x°= b. Second, for j # a, x/ + x. Because the number of 1s in both of 
these patterns is b, the number of indices at which the bits are both 1 is strictly less 
than b. Therefore, 5; =x x <b. Putting the first and second calculations together, 
we conclude that s, > sj and thus y,=1 and y;=0 for all j # a. 

The next theorem is a generalization of Theorem 1. 
Theorem 2. If X is a set of m distinct n-bit patterns, and the 1s in any example 
pattern x° is not a subset of the 1s in another pattern x* (that is, no two example 
patterns satisfy x° - x8 =x. x°), then the winner-take-all neural network with 
W= (w, Wo seve w,) and w=" remembers X. 
Proof. Consider this network on input x= x". First, sy =x° - x° and is equal to the 
total number of 1s in x*. Second, for j # a, 5; =x). x% % x°. x. Third, for all hb 
sj=x)-x°<x°-x°=s,. Putting these three constraints together, we conclude that 
Sq > sand thus y, = 1 and yj=0 for all j + a. 
Definition 2. In a winner-take-all neural network with W = (wt ; w, ats WwW, Jand 
w;=2/, we that say each x/ is a memory. We say that the network recognizes input 
x as memory x° if y.=1 (and yj=0 for all j + a). We say that a pattern x has c 
corrupted bits compared to a memory x" (or has c-bit deviation from x") if the 
number of indices at which the bits are different (that is, one bit is 0 and the other 
is 1 or vice versa) in x and x° is exactly c. We say that two memories x° and x’ have 
o overlapped bits if the number of indices at which the bits are both 1 in these 
memories is exactly o. 
Theorem 3. Ifx is a pattern with c <b — o corrupted bits compared to a memory x*, 
where b is the total number of 1s in x“ and o is the maximum number of overlapped 
bits in x° and x for all j # a, then the winner-take-all neural network recognizes 
x as x". 
Proof. Let co be the number of flipped 0s (that is, where 1 in x and 0 in x° appear 
at the same index) and c) be the number of flipped 1s (that is, where 0 in x and 
1 in x* appear at the same index). First, sy =x° -x=b — c,. Second, for j # a, 


s5= wd -x<o+e (s; reaches its maximum when all corrupted 1s are 0s and all 
corrupted 0s are 1s are at the same indices in x’). Third, because c= cg + cy and 
c<b—0,0+c=0+c—c<0+b—0—c=b — ¢. Putting the three constraints 
together, we conclude that s, > s;and thus y, = 1 and yj=0 for all j + a. 


Next, we consider a much larger set of n-bit patterns, X= {x!, x’, ...,«™} with 
M>m. 
Definition 3. Let each oe pattern x" = (x{', xf’, --- ,x/’) be associated with 


a desired oupey = Oy -,y"), with y“ € {0, 1} and Lew, = | (that is, 


only one specific y/ yh=l 7 7 0 for all j a). If yt 1, then we say that x’ is 
a pattern in class a. 
Let ¥° = (%°, X50 X,) = Xt» - xhyee oD for all 1 with y= 1 


(that is, the sum of all patterns in class a). Let t, = >, x;° for the b largest compo- 
nents of ¥". Let ¥° = (%,°,X5, +++ 5 X,°), with x° = % z*/t, if X;" is one of the b larg- 
est values and x," = 0 otherwise (that is, the averaged pattern for class a, restricted 


to the b most common bits and normalized to sum to 1). Let #° = (X,°, X55, +> .%) 


=lifx, > Oand %°=0 ifx,°=0. Let X = {%', 2°, --- ,2""} be the set of 
averaged patterns converted to binary. 

The next two theorems are similar to Theorems 1 and 3, but generalized to using 
averaged training patterns as analogue weights rather than using a single training 
pattern (that is, target pattern) as binary weights. 

Theorem 4. xi is a set of M distinct n-bit patterns, x/ contains exactly b 1s for all j 
at gek ate all j + k, then the winner-take-all neural network with 

= =(w, Wy 5-0 Wy )and w= —x/ remembers X. 

Pear Consider this network on input x= a* . First, we calculate 
Sy =X°- k=" |X," = 1. Second, forj # a, 4/8”, Because the number of Is 
in both of these patterns is b, there os at least one index i at which %/ = 1 (and 
x/ > 0) and £° = 0; thus s= x/.%° <0" _x/ = 1. Putting the two constraints 
together, we ecaeinde that s, >sjand thus y,=1 and y,=0 for all li a. 
Definition 4. In a winner- a all neural network with W=(wy1', wo, ..., Wn) 
and w, = x/, we say that each X¥/ isa memory and each x = %/ isa a perfect ae We 
say that a binary pattern x has c-bit deviation from a memory x° if the number of 
indices at which beer bits are different in x and *° is exactly c. We say that two 
memories X° and x’ “have overlap o=max{x* £5 x9 £°}. We say a bit i is no 
more than average in X° if x;° < 1/b, where b is the total mee of Is in#*. 
Theorem 5. If x is a pattern with c-bit deviation from a memory X°, where 
c<b(1 — 0), bis the total number of 1s in &° and 0 is the maximum overlap in ¥° 
and x! for allj j# a, and ifall flipped 1s are no more than average inx° and all flipped 
Os are no more than average in X! for all j # a, then the winner-take-all neural 
network recognizes x as *°. 
Proof. Let co be the number of flipped 0s (that is, where 1 in x and 0 in £° appear 
at the same index) and c, be the number of flipped 1s (that is, where 0 in x and 1 
in ¥° appear at the same index). First, s, =*x°- x > 1—c,/b. Second, for j # a, 
=x! -x>0-+¢,/b. Third, because c= cy + c, and c< b(1 — 0), 0 + cp/b=0 + 
(c — ¢)/b<o0+ [b(1 — 0) — c)]/b=1 — c/b. Putting the three constraints together, 
we conclude that s, >s; and thus y,=1 and y;=0 for all j + a. 

These are not the strongest results possible, but they provide intuition about 
how the winner-take-all neural network functions, with both binary and analogue 
weights, and how tolerant to errors it is. 

Data availability. All data that support the findings of this study are included in 
the manuscript and its Extended Data. Source Data for Figs. 2-4 and Extended 
Data Figs. 3-7 are provided with the online version of the paper. 


with %,° 


31. Zadeh, J. N. et al. NUPACK: analysis and design of nucleic acid systems. 
J. Comput. Chem. 32, 170-173 (2011). 
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Extended Data Fig. 1 | DNA implementation of winner-take-all 
neural networks. The winner-take-all computation is broken into five 
subfunctions: weight multiplication, summation, pairwise annihilation, 
signal restoration and reporting. In the chemical reactions listed next 

to the five subfunctions, the species in black are needed as part of the 
function, the species in grey are needed to facilitate the reactions and 

the waste species are not shown. krand k, are the rate constants of the 
pairwise-annihilation and signal-restoration reactions, respectively. In 
the DNA-strand-displacement implementation, weight multiplication 
and signal restoration are both catalytic reactions. The grey circle with an 


Yj 


arrow indicates the direction of the catalytic cycle. Representative, but not 
all possible, states are shown for the pairwise-annihilation reaction. Zigzag 
lines indicate short (5 or 7 nucleotide) toehold domains and straight 

lines indicate long (15 or 20 nucleotide) branch-migration domains in 
DNA strands, with arrowheads marking their 3’ ends. Each domain 

is labelled with a name, and asterisks in the names indicate sequence 
complementarity. Black-filled and white-filled arrowheads indicate the 
forwards and backwards directions of a reaction step, respectively. All 
DNA sequences are listed in Supplementary Table 1. 
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Extended Data Fig. 2 | Seesaw circuit implementation of winner-take- DNA species, respectively. A red number on a wire connected to a node 
all neural networks. a, Same as Fig. 1a. b, Seesaw circuit diagram!'! for (or between two nodes) indicates a free signal molecule, which can be an 
implementing the winner-take-all neural network. Each black number input or fuel strand. A red number inside a node indicates a gate molecule, 
indicates the identity of a seesaw node. A total of n + 3m nodes are which can be a weight, summation gate or restoration gate. A red number 
required for implementing a winner-take-all neural network with m on a wire that stops perpendicularly at two wires indicates an annihilator 
memories that each has n bits. The location and absolute value of each molecule. A negative red number inside a half node with a zigzag arrow 
red number indicates the identity and relative initial concentration of a indicates a reporter molecule. 
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Extended Data Fig. 3 | Experimental characterization of winner-take- 
all DNA neural networks. a, Two-species winner-take-all behaviour. The 
experimental data (left, same as Fig. 2a) were used to identify the reverse 
rate constant k, = 0.4 s~! of the annihilation reaction in simulations 
(right). All fluorescence kinetics data and simulation are shown over 

the course of 2.5 h. The standard concentration is 50 nM (1x). Initial 
concentrations of the annihilator, restoration gates, fuels and reporters are 
75 nM (1.5), 50 nM (1x), 100 nM (2x) and 100 nM (2x), respectively. 
b, A 4-bit pattern recognition circuit with input concentration varying 
from 50 nM to 500 nM. In each output trajectory plot, dotted lines 
indicate fluorescence kinetics data and solid lines indicate simulation. 
The patterns to the left and right of the arrow indicate input signal and 
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output classification, respectively. c, Applying thresholding to clean up 
noisy input signals. The thresholding mechanism has been reported 
previously in work on seesaw DNA circuits'!. The extended toehold in 
threshold molecule has 7 nucleotides. In b and c, to compare the range 
of inputs, the concentration of each input strand is shown relative to 50 
nM. The initial concentration of each weight molecule is either 0 or 50 
nM; weight fuels are twice the concentration of weight molecules. The 
initial concentrations of the summation gates, annihilator, restoration 
gates, restoration fuels and reporters are 100 nM (1x), 400 nM (4x), 
100 nM (1x), 200 nM (2x) and 200 nM (2x), respectively, with a standard 
concentration of 100 nM. 
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Extended Data Fig. 4 | A winner-take-all DNA neural network that 
recognizes 9-bit patterns as ‘L or ‘T°’. In each output trajectory plot, 
dotted lines indicate fluorescence kinetics data and solid lines indicate 
simulation. The standard concentration is 50 nM (1x). The initial 
concentration of each input strand is either 0 or 50 nM (1x). The initial 
concentration of each weight molecule is either 0 or 10 nM (0.2.x); 
weight fuels are twice the concentration of weight molecules. The initial 
concentrations of the summation gates, annihilator, restoration gates, 


restoration fuels and reporters are 50 nM (1x), 75 nM (1.5x), 50 nM 
(1x), 100 nM (2x) and 100 nM (2x), respectively. The patterns to the 
left and right of the arrow indicate input signal and output classification, 
respectively. In addition to the perfect inputs, 28 example input patterns 
with 1-5 corrupted bits were tested. Note that 5 is the maximum number 
of corrupted bits, because an ‘L with more than 5-bit corruption will be as 
similar as or more similar to a “T’, and vice versa. 
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Extended Data Fig. 5 | A winner-take-all DNA neural network 

that recognizes 100-bit patterns as one of two handwritten digits. 

a, Choosing the test input patterns on the basis of their locations in the 
weighted-sum space. b, Overlap between the two memories: ‘6’ and ‘7. 
c, 36 test patterns with the number of flipped bits shown next to their 
weighted sums. d, Recognizing handwritten digits with up to 30 flipped 


Time (h) Time (h) Time (h) 


10-13 14-17 18-21 
Deviation from the memory 


exactly 20 1s. 


Time (h) Time (h) 


22-25 26-40 (bits) 


bits compared to the perfect digits. Dotted lines indicate fluorescence 
kinetics data and solid lines indicate simulation. The standard 
concentration is 100 nM. Initial concentrations of all species are listed in 
Extended Data Fig. 10. The input pattern is shown in each plot. Note that 
40 is the maximum number of flipped bits because all patterns have 
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Extended Data Fig. 6 | Three-species winner-take-all behaviour and rate 
measurements for selecting DNA sequences in winner-take-all reaction 
pathways. a, Fluorescence kinetics data for a three-species winner-take- 

all circuit. Initial concentrations of the three weighted-sum species are 
shown on top of each plot as a number relative to a standard concentration 
of 50 nM (1x). The initial concentrations of the annihilator, restoration 
gates, fuels and reporters are 75 nM (1.5), 50 nM (1x), 100 nM (2x) 

and 100 nM (2x), respectively. b, Measuring the rates of 15 catalytic gates. 
Fluorescence kinetics data (dotted lines) and simulations (solid lines) of 
the signal restoration reaction are shown, with a trimolecular rate constant 


(k) fitted using a Markov chain Monte Carlo package (https://github.com/ 
joshburkart/mathematica-mcmc). The reporting reaction was needed for 
the fluorescence readout. Initial concentrations of all species are listed 

as a number relative to a standard concentration of 50 nM. c, The 15 
catalytic gates sorted and grouped on the basis of their rate constants. All 
rate constants are within +65% of the median. The two coloured groups 
of three rate constants are within 5% of the median. These two groups 
of catalytic gates were selected for signal restoration in the winner-take- 
all DNA neural networks that remember two to nine 100-bit patterns 
(Methods section ‘Sequence design’). 
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diagram. b, Choosing the test input patterns on the basis of their locations in —_ Initial concentrations of all species are listed in Extended Data Fig. 10. The 
the weighted-sum space. c, Overlap between the three memories: ‘2; ‘3’ and input pattern is shown in each plot. Note that 40 is the maximum number of 
“4. d, Recognizing handwritten digits with up to 28 flipped bits compared flipped bits because all patterns have exactly 20 1s. 
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Extended Data Fig. 9 | Size and performance analysis of logic circuits 
for pattern recognition. a, Logic circuits that determine whether a 
9-bit pattern is more similar to ‘L or “T’ b, Logic circuits that recognize 
100-bit handwritten digits. To find a logic circuit that produces correct 
outputs for a given set of inputs, with no constraint on other inputs, 

we first created a truth table including all experimentally tested inputs 
and their corresponding outputs. The outputs for all other inputs 

were specified as ‘don’t care, meaning the values could be 0 or 1. The 
truth table was converted to a Boolean expression and minimized in 
Mathematica, and then minimized again jointly for multiple outputs 
and mapped to a logic circuit in Logic Friday (https://download.cnet. 


Training set size 


com/Logic-Friday/3000-20415_4-75848245.html). In the minimized 
truth tables shown here, ‘X’ indicates a specific bit of the input on which 
the output does not depend. For comparison, minimized logic circuits 
were also generated from training sets with a varying total number of 
random examples from the MNIST database. The performance of each 
logic circuit, defined as the percentage of correctly classified inputs, was 
computed using all examples in the database. To make the minimization 
and mapping to logic gates computable in Logic Friday, the size of the 
input was restricted to the 16 most significant bits, determined on the 
basis of the weight matrix of the neural networks. 
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b, Weights and example inputs in the neural network that recognizes ‘6’ 
and ‘7. c, Weights in the neural network that recognizes ‘1’-‘9. Weights 
and inputs used in all experiments are listed in Supplementary Table 2. 
Detailed protocols for all experiments are listed in Supplementary Table 3. 


LETTER 


https://doi.org/10.1038/s41586-018-0307-8 


Controlling an organic synthesis robot with 
machine learning to search for new reactivity 


Jarostaw M. Granda!, Liva Donina!, Vincenza Dragone!, De-Liang Long! & Leroy Cronin!* 


The discovery of chemical reactions is an inherently unpredictable 
and time-consuming process!. An attractive alternative is to predict 
reactivity, although relevant approaches, such as computer-aided 
reaction design, are still in their infancy”. Reaction prediction based 
on high-level quantum chemical methods is complex’, even for 
simple molecules. Although machine learning is powerful for data 
analysis*”, its applications in chemistry are still being developed®. 
Inspired by strategies based on chemists’ intuition’, we propose that 
a reaction system controlled by a machine learning algorithm may 
be able to explore the space of chemical reactions quickly, especially 
if trained by an expert®. Here we present an organic synthesis robot 
that can perform chemical reactions and analysis faster than they 
can be performed manually, as well as predict the reactivity of 
possible reagent combinations after conducting a small number of 
experiments, thus effectively navigating chemical reaction space. 
By using machine learning for decision making, enabled by binary 
encoding of the chemical inputs, the reactions can be assessed in real 
time using nuclear magnetic resonance and infrared spectroscopy. 
The machine learning system was able to predict the reactivity of 
about 1,000 reaction combinations with accuracy greater than 80 
per cent after considering the outcomes of slightly over 10 per cent of 
the dataset. This approach was also used to calculate the reactivity of 
published datasets. Further, by using real-time data from our robot, 
these predictions were followed up manually by a chemist, leading 
to the discovery of four reactions. 

Recent progress in automated chemistry”"®, online analytics!’ and 
real-time optimization’? suggests that it is possible to construct a 
robot that can autonomously explore chemical reactivity. With this in 
mind, we have designed, built and programmed a bespoke chemical- 
handling robot comprising in-line spectroscopy, real-time data 
analysis and feedback mechanisms (Fig. 1a, b). The robot is configured 
to execute six experiments in parallel, allowing up to 36 experiments 
to be performed per day. To evaluate the outcome of a reaction, the 
robot is equipped with real-time sensors—a flow benchtop nuclear 
magnetic resonance (NMR) system!%, a mass spectrometer and an 
attenuated total-reflection infrared spectroscopy system'4—to record 
the spectra of the reaction mixtures. Then, it uses an algorithm to 
automatically classify the reaction mixtures as reactive or non-reactive, 
which is reported in binary form as zero or one, using a supported 
vector machine! (SVM) with a linear kernel (Fig. 1c) model. This 
algorithm compares the spectrum of the starting materials with that 
recorded by the robotic platform using NMR and infrared spectros- 
copy, registering differences as reactivity hits (see Fig. le for an exam- 
ple of a non-reactive mixture and Fig. 1f for a reactive mixture). By 
training the model on 72 reactive and non-reactive mixtures manu- 
ally classified by an expert chemist, the model could classify the reac- 
tivity of reaction mixtures with an accuracy of 86%, as determined 
by leave-one-out cross-validation. The machine learning algorithm 
used to explore the chemical space needs an automatically generated 
representation of the reactions!®. Because the representation of the 
data is crucial for machine learning!’ we created a reaction descriptor 
with a width corresponding to the number of starting materials in the 


pool of reagents and with bits representing reagents that were present 
in a given reaction mixture to one, similarly to one-hot encoding. 
Figure 1d shows example vector representations for the model sub- 
strate pool consisting aniline, benzaldehyde, acetyl chloride, phenyl- 
hydrazine and furan. 

This approach to representing chemical space renders it structure- 
independent and allows the robotic platform to operate without prior 
knowledge about reactivity and chemical structure (Fig. 2). Initially, 
the chemical space was sampled by performing reactions with ran- 
dom combinations of starting materials, evaluating their reactivity as 
reactive or non-reactive using the SVM model (to determine expected 
values of reactivity, Y) and encoding them in vector form (to obtain a 
training set, X). The process of random selection is important because 
the system avoids making prior assumptions about the possible reac- 
tivity of the reagents, ensuring that the initial run results are unknown. 
Even if the reaction mixture decomposes or is non-reactive, this infor- 
mation is still useful for the navigation of the chemical space, allowing 
real-time assessment of the reactivity of the starting materials. After 
the reaction database has been built, a linear discriminant analysis 
(LDA)!® model is trained on the data obtained to construct a model 
of the chemical space. The remaining reactions are then rated by pre- 
dicting the probability of reactivity using the LDA model. This allows 
for autonomous decision making, and the reaction with the highest 
score is performed and analysed by the robotic system, thus avoiding 
many non-reactive combinations and speeding up the search. The loop 
is closed by updating the reaction database with the result of the last 
experiment from the platform and then by retraining the LDA model 
of the chemical space. The cycle is repeated until the required number 
of reactions is performed or until the whole space—defined by a pool of 
18 reactive, structurally diverse molecules containing functional groups 
1-18 (Extended Data Fig. 1)—is spanned. The chemical space consti- 
tuted of two- and three-component reactions formed from the pool of 
starting materials, giving 969 possible experiments. When LDA was 
performed, the algorithm was able to clearly differentiate between reac- 
tive and non-reactive combinations of the starting materials (Fig. 3a). 
This means that the LDA can be useful for predicting new reactivity. 
By taking this approach, we showed that the robot can learn how reac- 
tive the starting materials are and efficiently navigate chemical space. 
For example, the reaction mixture composed from 2-aminothiazole 
(9), phenylacetyl chloride (15) and DBU (13) would be classified as 
highly reactive, a mixture of malononitrile (3), methylacetoacetate (18) 
and DBU (13) as moderately reactive and a mixture of nitromethane 
(4), benzofuroxan (7) and toluenesulfonylmethy] isocyanide (17) as 
non-reactive. These assignments agree with basic chemical intuition, 
demonstrating the predictive power of the model (see Supplementary 
Information for the reactivity of all reactions according to the LDA 
projection). 

To further test the learning ability of our robotic system, we per- 
formed simulations to calculate the number of reactive versus non- 
reactive combinations of the starting materials chosen by the algorithm 
during the exploration of the chemical space (Fig. 3b). In the initial 
stage, the space was randomly sampled, resulting in an equal number 
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Fig. 1 | Automatic reaction detection with machine learning. 

a, Schematic of the chemical robot. The circles are pumps and the coloured 
dots are the positions of the valves. APCI, atmospheric pressure chemical 
ionization; MS, mass spectrometer; ATR-IR, attenuated total reflectance 
infrared spectrometer. b, Photograph of the chemical robot, showing 

the pumps, reactors and real-time analytics, including the NMR, MS 


of reactive and non-reactive combinations being chosen by the algo- 
rithm. After reaching the desired number of reactions, decisions 
were made using LDA, leading to a rapid increase in the number of 
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Fig. 2 | Overview of the artificial intelligence algorithm used for the 
exploration of chemical space with the liquid-handling robot. The 
liquid-handling robot performs reactions by choosing reactants from 

the pool of starting materials. Online analytics is used for real-time 
interpretation of reaction outcomes as reactive or non-reactive, and the 
reaction database stores reaction outcomes. Machine learning is used to 
build a model of the chemical space, recommend the next experiments and 
control the robot. 
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reactive combinations being chosen by the algorithm. In the end, the 
algorithm identified the empty part of chemical space; that is, the 
last experiments that were chosen were non-reactive (Fig. 3b). The 
accuracy of predicting the reactivity is shown in Fig. 3c, which shows 
that as chemical space is progressively searched, the accuracy of the 
prediction of the reactivity increases along with the confidence 
intervals. This demonstrates that the robot can ‘self-learn’ using 
artificial intelligence by exploiting this reactivity-first approach. 
Additionally, the accuracy of the LDA classifier in predicting the reac- 
tivity of the reaction mixtures was determined as 86% + 3% using 
five-fold cross-validation. 

To further explore the predictive power of our approach, we also 
investigated the Suzuki-Miyaura reaction space (see Fig. 4a) described 
recently’? by searching for reactions with the highest yield with our 
machine learning approach. To achieve this, we built a neural network 
(for details and implementation, see Supplementary Information) and 
used one-hot encoding to encode literature data for machine learn- 
ing. We then used the neural network to explore the hypothesis that 
machine learning can be used for the prediction of yields. The dataset 
was partitioned into a training set (3,456 reactions), a validation set 
(576 reactions) and a test set (1,728 reactions) to train and validate the 
neural network. When the neural network was tested, it performed 
well, giving yields with a root-mean-square error of 11% for 1,728 
reactions (see Fig. 4b for the correlation between real and predicted 
yield). Having established that our approach can predict the yields of 
Suzuki-Miyaura reactions, we performed a simulation to explore this 
chemical space, as described above for our robot. Initially, the algorithm 
randomly chose 10% of the reaction space (576 reactions) and then 
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Fig. 4 | Exploring the Suzuki-Miyaura reaction using machine 
learning. a, The reaction space of the Suzuki-Miyaura reaction. Shown 
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representation of the reaction for machine learning. b, Validation of 
the predictive power of the model for a test set of 30% of the reactions 
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machine-learning-controlled exploration of this reaction space. The 
yellow bar shows the initial random choice of 10% of reaction space (576 
reactions). The green bars show the next batches of 100 reactions chosen 
by the machine learning algorithm. The error bars represent the standard 
deviation within individual batches for Suzuki-Miyaura coupling. 
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Fig. 5 | Reactivity discovered with the machine-learning-driven 

robot. a, Multicomponent reactions between methyl propiolate (16), 
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cent. Light-grey boxes show calculated and measured (by electrospray 
ionization mass spectroscopy, ESI-MS) molecular ion masses. b, 'H 
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c, Multicomponent reaction of DMAP (12), DMAD (1) and nitrobenzene 
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the neural network was trained on these data. The unexplored parts 
of the reaction space were then rated by the machine learning model, 
the next batch of candidates with the best scores was selected, and the 
true yield was evaluated. The initial random guess had a mean yield 
of 39% and standard deviation (s.d.) of 27%, shown as a yellow bar 
in Fig. 4c. The green bars show subsequent batches of 100 reactions 
chosen by the machine learning algorithm. For example, the first batch 
of 100 reactions had a mean yield of 85% and s.d. of 14%. The subse- 
quent batches contained progressively fewer reactive starting materi- 
als, ultimately reaching non-reactive parts of the reaction space. This 
approach is valuable because it shows that by realizing only 10% of the 
total number of reactions, we can predict the outcomes of the remain- 
ing 90% without needing to carry out the experiments. Recently, the 
application of machine learning to yield prediction and the navigation 
of reaction space has been demonstrated for a Buchwald—Hartwig ami- 
nation” and deoxyfluorination with sulfonyl] fluorides’, leading to 
similar conclusions. 

We used the reactive combinations discovered by the system to 
manually carry out reactions. For example, by analysing the spectra 
recorded by the robot, we identified several transformations (Fig. 5). 
For instance, analysis of the ‘H NMR spectrum for the reaction of 
methyl propiolate (16) with benzofuroxan (7) and DBU (13) suggests 
an interesting transformation with new peaks visible in the chemical 
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ratio. d, Solid-state structure of compound cis-20 (50% probability level). 
e, Synthesis of chlorocyanonitrone (21) from nitrosobenzene (14) and 
trichloroacetonitrile (5) in the presence of DBU (13). f, Newly discovered 
reaction of phenylketene with DBU. g, Tanimoto similarity between 
discovered reactions and 3.5 million known reactions. h, Histogram 
showing the Tanimoto similarity index between the discovered reactions 
and 3.5 million known reactions. 


shift range 6=4.0-5.0 p.p.m. and 7.9-8.5 p.p.m. (Fig. 5b). Isolation and 
NMR analysis of the reaction product showed that it contained protons 
originating from all starting materials suggesting that the compound 
resulted from a multicomponent reaction. Analysis of the ‘H-!C 
heteronuclear single-quantum and multiple-bond correlation spectra 
determined the structure of product 19 (see Extended Data Fig. 2a for 
a proposed mechanism). 

We explored the utility of this reaction by synthesizing a small 
library of related molecules. By using substituted alkynes, we were able 
to prepare six structurally diverse compounds in one step (Extended 
Data Fig. 2b). Reaction of DMAD (1), nitrosobenzene (14) and 
DMAP (12) led to a multicomponent reaction with formation of 2,5- 
dihydrofurane derivative 20 at a diastereometric ratio of 2.4:1 (trans:cis) 
(Fig. 5c, d). Figure 5e shows the formation of chlorocyanonitrone 21— 
an unreported class of nitrones—which was isolated as the product 
of the reaction between trichloroacetonitrile (5) and nitrosobenzene 
(14) in the presence of DBU (13) (structure of 21 confirmed by X-ray 
analysis). Finally, we also found reactivity between ketenes and DBU 
(Fig. 5f), indicated by the peaks at high molecular weight recorded 
by the platform for this reaction (mass-to-charge ratio, m/z= 506.9 
and m/z= 657); see Fig. 5f. Under basic conditions, phenylacetyl 
chloride (15) is deprotonated by DBU, giving phenyl ketene, which 
reacts with DBU to give the polycyclic azepine derivative 22 (Fig. 5f). 
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The suggested mechanisms for these transformations are presented in 
Extended Data Fig. 2c, d. 

To assess how unique these reactions are, we employed the Tanimoto 
similarity index, which compares starting materials and products”. We 
considered over 40 million reactions, filtered by first excluding non- 
organic reactions, then requiring the same number of reagents and 
products as our discoveries, and finally by requiring that the reactions 
have all the necessary structural information. This filtering left more 
than about 3.5 million reactions to compare. For each reaction, we 
calculated the similarity between each reagent and the product and 
calculated the mean from the obtained values. For reactions in which 
the reagents undergo a slight modification to reach the product, this 
reaction similarity index would be close to 1. Conversely, if the reagents 
change substantially so that the product is very different, then the result 
would be close to 0. All four of the reactions discovered here 
(see Supplementary Information) have a lower similarity index than 
the mean. In fact, all are in the top 10 percentile, with reaction 2 (which 
gives product 20) in the top 0.8 percentile (Fig. 5g), and they are consi- 
derably more distinct from the reactions chosen at random. The histo- 
gram in Fig. 5h shows that there is only one peak in the distribution and 
that the mean value of the Tanimoto similarity index is 0.29. 

This study represents an important step towards developing intel- 
ligent automated approaches to chemical discovery using artificial- 
intelligence-driven chemical robots trained by human experts from 
the bottom up, in contrast to top-down fragment-based approaches”’. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0307-8. 
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METHODS 


General experimental remarks. Reagents were from Sigma Aldrich and were 
used as received. Acetonitrile employed as a solvent in the platform was HPLC 
grade (VWR International). Mass spectra were recorded on a time-of-flight mass 
spectrometer (MicroTOF-Q MS) equipped with an electrospray source sup- 
plied by Bruker Daltonics Ltd. All data were collected in positive ion mode. The 
spectrometer was calibrated with a standard tune mix to give a precision of about 
1.5 p.p.m. in the region m/z = 100-3,000. NMR data were recorded using a Bruker 
Avance III 600 MHz or a Bruker Avance 400 MHz NMR system. The spectra 
were recorded at 298 K using residual-solvent proton peaks for scale reference (for 
example, 'H: 6 (CDCl;) = 7.26; °C: 6 (CDCl3) =77.16). The chemical shifts are 
reported in p.p.m. using the 6 scale and all coupling constants (J) are given in Hz. 
The following abbreviations are used to characterize spin multiplicities: s, singlet; 
d, doublet; t, triplet; q, quadruplet; m, multiplet; dd, double doublet; dt, double 
triplet; dq, double quadruplet; and ddt, double doublets of triplets. Spectra obtained 
using distortionless enhancement by polarization transfer, correlation spectros- 
copy, heteronuclear single-quantum and multiple-bond correlation spectroscopy 
and rotating frame Overhauser-effect spectroscopy were used for structure deter- 
mination and structural assignments. New reaction candidates were analysed 
using thin-layer chromatography (TLC) and visualized using TLC plates with a 
fluorescent indicator. 

Syringe pumps and tubing. Control over the fluids was achieved using 27 pumps 
(model C3000, Tricontinent) equipped with 5 ml syringes (TriContinent) and a 
four-way solenoid valve according to the requirements of the experiments. The 
pumps were connected using a RS232 port and a daisy chain, allowing the con- 
nection of up to 16 pumps on a single RS232 bus. Commands to the pumps were 
sent using the pumps’ proprietary control language, implemented in a Python 
module, allowing control over the pumps and error-reporting functionality (for 
example, pumps malfunctioning). PTFE plastic tubing with an outer diameter of 
1/8 inch (3.175 mm) was cut to the specified length and connected using standard 
HPLC low-pressure PTFE connectors and PEEK manifolds (supplied by Kinesis). 
Online attenuated total-reflectance infrared spectroscopy. All spectra were 
recorded using a Thermo ScientificNicoletiS5 Fourier transform infrared spec- 
troscopy system equipped with a ZnSe Golden Gate attenuated total reflectance 
infrared flow cell. The resolution was set at 4 cm! and each sample’s spectrum was 
recorded using 36 scans. The spectrometer was controlled by OMNIC software 
using Python and the ActiveX software framework. Before measurement of the 
spectra, the solvent (MeCN) was recorded as background. 

Online NMR spectroscopy. The NMR spectra were recorded using a Spinsolve 
benchtop NMR system from Magritek with a compact permanent magnet 
(43 MHz) based on the Hallbach design, working on a lock-free basis (not requir- 
ing deuterated solvents). Shimming was performed using a D,O/H2O mixture (9:1 
v:v) to minimize the half-width of the solvent peak. To measure reaction mixtures, 
the spectrometer was equipped with a home-built flow cell with a standard 5 mm 
width to maximize sensitivity. The spectra were measured in a stopped flow by 
pumping reaction mixtures into the flow cell. The spectrometer was controlled by 
Spinsolve software by sending XML messages over a network connection. 
Benchtop mass spectroscopy. The spectra were recorded with an Advion 
Expression mass spectrometer using the atmospheric pressure chemical ionization 
technique. The detailed acquisition parameters can be found in Supplementary 
Information. The mass spectrometer was controlled using Python wrapper soft- 
ware and Advion API, allowing complete control over the instrument and acqui- 
sition parameters. Dilution of the reaction mixtures, which was necessary for 
recording their spectra, was realized using two syringe pumps by diluting reaction 
mixtures 3,125 times using solvent (MeCN) before the measurements. 

Flow setup implementation. The platform was assembled as in Fig. 1a, using the 
27 syringe pumps, the benchtop infrared spectroscopy system, the NMR and the 
mass spectrometer. Round bottom flasks (25 ml) were employed as the mixer and 
reactors. 18 pumps were responsible for dispensing the chemicals to the mixer, six 
pumps were used to transfer the reaction mixture from the mixer to the proper 
reactor, one pump was employed to pump the solvent (MeCN), and two pumps 


were used to realize the dilution step that was necessary to measure mass spectra. 
The starting materials were prepared as 1.0 M solutions. Automatic data collection 
and processing and platform control were achieved using the Python programming 
language. Before the execution of the reaction, the robot was cleaned three times 
by flushing the mixer, reactor flasks and analytics. The reaction was performed by 
adding proper reagents to the mixer (total volume 5.0 ml) in a 1:1 ratio, transferring 
the reaction mixture to the reactor and saving the reaction parameters (the identity 
and volumes of the starting materials). After two hours, the reaction mixture was 
transferred to the measurement loop, where the NMR and infrared spectra were 
recorded. The mass spectrum was recorded after dilution of the reaction mixture. 
After the reaction mixture has been measured, the mixer, reactor and analytics 
were cleaned by flushing with solvent twice. Parallel execution of six reactions 
was implemented by shifting the execution of each reaction in time so that each 
experiment had access to the liquid-handling robot and analytics without colliding 
with the other experiments. Spectra (NMR and infrared) were also recorded for 
each chemical in the pool of starting materials (Extended Data Fig. 1) that was used 
for the calculation of the theoretical spectrum of the reaction mixture. 
Autonomous navigation of chemical space by the robot. The algorithm for the 
exploration of chemical space starts by measuring 90 random experiments in the 
platform, and then each experiment in this set is processed to assess its reactivity 
and generate its vector representation. The 'H NMR spectrum of the reaction 
mixture is automatically processed using fast Fourier transform, phasing and ref- 
erencing of the solvent peak. The intensity of the solvent peak is normalized to 1.0 
(the solvent peak is used as an internal standard, allowing easy addition of the spec- 
tra). The infrared spectra are used without any preprocessing. Next, the theoretical 
spectra of the reaction mixture (the sum of the starting materials) are constructed 
for NMR and infrared spectroscopy. The spectra are normalized by removing the 
mean and scaled to unit variance. The reactivity of the reaction mixture is assessed 
by feeding the NMR reaction mixture and NMR theoretical spectrum to the SVM 
classifier (trained previously; see Supplementary Information). The outcome of 
the classifier is Y=0 (non-reactive) or Y= 1 (reactive). Similarly, the reactivity 
is assessed by the SVM classifier using the infrared spectra. An experiment is 
classified as reactive if any of the above classifiers categorizes it as reactive. The 
vector representation is generated using the identity of the starting materials. The 
vector representation (X) and reactivity (Y) are added to the reaction database. 
The machine learning algorithms are realized using the sci-kit learn package 
in Python”4, After the initial database of the reactions is built, the LDA classifier 
is trained on the vector representation of the reactions (X) and their reactivity 
(Y). All the possible unperformed reactions are then rated by assigning them the 
probability of being reactive, as calculated from the LDA model. After the reactions 
with the highest score are realized by the liquid-handling robot, they are pro- 
cessed as described above, updating the reaction database. Then, the LDA model 
is retrained on the updated database and the robot iteratively explores the chemical 
space until the desired number of experiments is performed. Simulations of the 
exploration of the chemical space with this algorithm were performed using the 
data collected by the robot. 
Syntheses of molecules discovered by the platform. The solutions of the start- 
ing materials (1.0 M solutions in MeCN) were added to the round bottom flask 
(25 ml) ina 1:1 ratio (total volume 5.0 ml) and stirred in room temperature for 
2 h. Subsequently, silica gel (4.0 g) was added and the solvent was evaporated. 
The products of the reaction were isolated using column chromatography. The 
syntheses of all compounds were adjusted according to the need for each reaction. 
For the detailed procedure followed for each compound and characterization, see 
Supplementary Information. 
Data and code availability. The data used for simulations of the exploration of 
chemical space are available in Supplementary Information. The code and data can 
be found online at https://github.com/croningp/reaction_learning. The data used 
for Suzuki-Miyaura coupling are available in ref. !°. 


24. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 
12, 2825-2830 (2011). 
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Palaeoclimate reconstructions of periods with warm climates and 
high atmospheric CO, concentrations are crucial for developing 
better projections of future climate change. Deep-ocean!” and 
high-latitude’? palaeotemperature proxies demonstrate that the 
Eocene epoch (56 to 34 million years ago) encompasses the warmest 
interval of the past 66 million years, followed by cooling towards 
the eventual establishment of ice caps on Antarctica. Eocene polar 
warmth is well established, so the main obstacle in quantifying 
the evolution of key climate parameters, such as global average 
temperature change and its polar amplification, is the lack of 
continuous high-quality tropical temperature reconstructions. Here 
we present a continuous Eocene equatorial sea surface temperature 
record, based on biomarker palaeothermometry applied on Atlantic 
Ocean sediments. We combine this record with the sparse existing 
data*® to construct a 26-million-year multi-proxy, multi-site stack 
of Eocene tropical climate evolution. We find that tropical and 
deep-ocean temperatures changed in parallel, under the influence 
of both long-term climate trends and short-lived events. This is 
consistent with the hypothesis that greenhouse gas forcing”® 
rather than changes in ocean circulation®!°, was the main driver of 
Eocene climate. Moreover, we observe a strong linear relationship 
between tropical and deep-ocean temperatures, which implies a 
constant polar amplification factor throughout the generally ice- 
free Eocene. Quantitative comparison with fully coupled climate 
model simulations indicates that global average temperatures 
were about 29, 26, 23 and 19 degrees Celsius in the early, early 
middle, late middle and late Eocene, respectively, compared to 
the preindustrial temperature of 14.4 degrees Celsius. Finally, 
combining proxy- and model-based temperature estimates with 
available CO, reconstructions® yields estimates of an Eocene Earth 
system sensitivity of 0.9 to 2.3 kelvin per watt per square metre at 
68 per cent probability, consistent with the high end of previous 
estimates! 

It is well established that deep-ocean temperatures peaked during 
the Early Eocene Climatic Optimum (EECO; about 52-50 million 
years (Myr) ago) and had declined substantially by the latest Eocene 
(about 34 Myr ago)!”. These trends are mimicked in reconstructions of 
sea surface temperature (SST) in the southern high latitudes? because 
Eocene deep-ocean temperatures reflect Southern Ocean winter 
surface conditions that are relayed to the abyss through deep-water 
formation’. However, to unlock the unique promise of Eocene palaeo- 
climate records to answer fundamental questions about the relationship 
between atmospheric CO, concentrations and global temperature, and 
to quantify the polar amplification of climate change, accurate recon- 
structions of tropical surface oceans are required. Moreover, tropical 
records are necessary to test the two competing hypotheses for Eocene 
deep-ocean and polar cooling: (1) decreasing greenhouse gas con- 
centrations, predominantly CO, (refs ”), and (2) changes in ocean 


circulation and meridional heat transport associated with opening 
of ocean gateways”"”. In theory, gateway opening cools the Southern 
Ocean and deep ocean while warming the upper tropical ocean by a 
few degrees!*, whereas CO) decline leads to global cooling at both the 
Equator and the poles'*—albeit with predicted amplified polar temper- 
ature change relative to the tropics!>. Importantly, this amplification 
factor affects the volume and extent of ice sheets, and thus the global 
sea level, and is therefore critical to constrain, also for future projec- 
tions. Yet, despite evidence for CO2 decline over the Eocene’, existing 
tropical records* are fragmentary and of low resolution, and therefore 
insufficient to address these crucial questions. 

We generated new temperature reconstructions using a clay-bearing, 
micritic porcellanite sequence recovered at Ocean Drilling Program 
(ODP) Site 959 in the eastern equatorial Atlantic Ocean (Fig. 1). Site 
959 was positioned at near-equatorial latitudes and deep-bathyal water 
depths throughout the Eocene’® (Extended Data Table 1a). We augment 
the existing age model’® with new biostratigraphic and chemostrati- 
graphic constraints (Extended Data Table 1b, Extended Data Fig. 1). 
Although carbonate preservation is poor’®, well preserved, immature 
organic matter is present throughout!”. We therefore employ the 
organic TEXg¢ palaeothermometer, which utilizes the temperature- 
dependent distribution of thaumarchaeotal membrane lipids to recon- 
struct SST. Fractional abundances of the various lipids at Site 959 indicate 
an upper water column (50-300 m) source (Methods), which allows 
confident SST interpretations from TEXg¢. Several calibrations exist to 
translate TEXg. data into SSTs on the basis of a modern core-top 
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as 


Fig. 1 | Palaeogeographic reconstruction of the studied sites 40 million 
years ago. The figure shows the approximate palaeoposition of the studied 
site (ODP Site 959) and the main sites that we used to produce a tropical 
SST compilation: ODP sites 865, 925 and 929; Tanzania Drilling Project 
(TDP); Sagamu Quarry (SQ) and IB10B Core, Nigeria. Continental 

plates are shown in dark grey. Light-grey gridlines represent latitudes and 
longitudes, with 30° spacing. The map was generated with GPlates, using 
the rotation frame and tectonic reconstruction of Matthews et al.°*°. 
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Fig. 2 | Eocene global climate evolution. a, CO) record from boron 
isotopes from the TDP (orange squares; error bars represent 68% 
confidence intervals) and alkenones from ODP Sites 612 and 925 (yellow 
and orange circles; uncertainties from original studies); data sources are 
provided in Methods. b, TEX}i-based SST record for Site 959 (red) and 
additional tropical compilation (pink; see Extended Data Fig. 5). The 


dataset'®. For biophysical and analytical reasons, we prefer conservative 
estimates of tropical temperature generated by the logarithmic TEX 5 
calibration!’ (Methods, Extended Data Fig. 2). In addition, we use a 
linear Bayesian spatially varying regression (BAYSPAR) calibration’® 
as complementary analysis (Extended Data Fig. 3). 

Our new equatorial record from Site 959 (Fig. 2) shows latest-Palaeocene 
(about 58-56 Myr ago) SSTs of 31-33 °C, mimicking time-equivalent 
SSTs derived from glassy preserved planktonic foraminiferal oxygen 
isotope (6!°O) and Mg/Ca ratios, as well as TEX {i data from a nearby 
section in Nigeria”°—supporting the notion that TEX}{ accurately 
reflects SST at Site 959. The record further reveals warming by 2-3°C 
from the latest Palaeocene to the earliest Eocene (58 to 53 Myr ago) to 
peak EECO temperatures of 34-35 °C. Superimposed transient warm- 
ing of around 4°C to around 37°C occurred during the Palaeocene- 
Eocene Thermal Maximum (PETM), about 56 Myr ago”!. A long-term 
SST drop of about 7 °C to about 28°C characterizes the interval from 
the middle to late Eocene, and an additional cooling by around 2°C to 
around 26°C marks the Eocene-Oligocene transition (about 34 Myr 
ago). Superimposed on long-term cooling is the first tropical SST 
reconstruction of the Middle Eocene Climatic Optimum”? (MECO), 
at roughly 40 Myr ago, displaying warming by around 4°C from back- 
ground temperatures to a peak of about 33°C. This provides compelling 
evidence that the MECO was associated with global warming; surface 
warming was previously only recognized in extratropical regions of the 
Southern Hemisphere. We also record pre- MECO temperature varia- 
bility of similar duration but lower amplitude. 

To assess whether regional upwelling” at Site 959 influenced TEXg¢- 
based SST variability, we consider published total organic carbon 


dashed line represents a hiatus. Green diamonds (Site 1172) show a high- 
latitude TEX g,-based SST record’. c, 8'8O-based ice-free deep-ocean 
temperature (described in Methods), with fitted LOESS model (black line) 
and 95% confidence interval (dark-blue shading). Age follows the 
Geologic Time Scale 2012 (GTS2012). Pal., Palaeocene; Olig., Oligocene; 
EOT, Eocene-Oligocene transition. 


(TOC) contents!” of sediments and generate dinocyst assemblage 
data, as dinocysts are highly sensitive to upwelling in modern and 
Palaeogene oceans” (Extended Data Fig. 4). The continuous presence 
of cysts of Protoperidiniaceae (derived from heterotrophic dinoflag- 
ellates) and elevated TOC within biosiliceous sediments!® indicate 
upwelling throughout the middle and late Eocene. The early Eocene is 
less well constrained, but presence of Protoperidiniaceae and abundant 
biosilica also suggests upwelling. An upper-Eocene increase in TOC 
content!” might indicate upwelling intensification. Although this may 
exaggerate latest-Eocene cooling at Site 959, the recorded magnitude 
(about 2°C) is similar to previous work at tropical locations”* (Fig. 2). 
Apart from the late Eocene, however, variations in our SST record are 
not strongly correlated to changes in the abundance of upwelling- 
indicative dinocysts or TOC content. Regional upwelling may have 
muted SSTs by a few degrees. Indeed, our values are somewhat lower 
than the few time-equivalent data points from the warm pool sampled 
in Tanzania‘, suggesting that we sampled the first Eocene analogue 
to the ‘cold tongue’ in the modern ocean. Importantly, this analysis 
indicates that variations in the strength of upwelling were not a major 
factor governing SST change at the study site. 

We combine our equatorial Site 959 SST record with the available 
low-resolution data derived from a suite of SST proxies from the Indian, 
Atlantic and Pacific tropical oceans (Fig. 2; data sources in Extended 
Data Fig. 5). Because each of these proxies is subject to different sys- 
tematic sources of error, the close correspondence between various 
organic and carbonate proxies in both absolute temperatures and trends 
indicates a robust convergent temperature signal. A local regression 
(LOESS) model is applied to the resulting compilation to produce an 
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Fig. 3 | Proxy-model synthesis of Eocene temperatures. a, Top, tropical 
SST compilation (red) and LOESS model (black line) with 95% confidence 
interval (grey shading). Bottom, deep-ocean temperatures from Fig. 2c. 
Open squares are mean modelled tropical SSTs and deep-ocean temperatures 
of simulations EO1 (560 p.p.m. CO), EO2 (1,120 p.p.m. CO2), EO3 


estimate of Eocene mean tropical temperature (Fig. 3a), yielding 4-7 °C 
of cooling through the Eocene. Remarkably, long-term trends and 
sub-million-year (MECO and PETM) tropical SST variations mimic 
those from the Southern Ocean and the deep ocean, on the basis of an 
updated compilation of benthic foraminiferal §!8O-derived tempera- 
tures (Fig. 2). A sensitivity study indicates that potential late Eocene 
Antarctic ice caps did not appreciably affect this deep-ocean tempera- 
ture proxy (Methods, Extended Data Fig. 6). The close correspondence 
between tropical and deep-sea temperatures provides solid proof that 
greenhouse gas forcing, rather than ocean circulation change, caused 
Eocene cooling, as has been suggested elsewhere””*. 

As an approximation of the pole-to-Equator temperature difference, 
or meridional temperature gradient (MTG), we calculate the difference 
between tropical mean SST and deep-ocean temperatures (Methods, 
Fig. 3b). Although different TEXg¢ calibrations result in slightly differ- 
ent early Eocene MTGs (Extended Data Fig. 3), the gradient generally 
increases with cooling climate and vice versa, reflecting polar ampli- 
fication of temperature variability. Remarkably, regression analysis 
indicates a strong linear relationship between deep-ocean and tropical 
temperatures (Fig. 4a; also between high-latitude and tropical SSTs in 
Extended Data Fig. 7). Although uncertainty on the exact value is large 
owing to uncertainties in temperature proxies and calibrations (Fig. 4b, 
Extended Data Fig. 3), this signifies a stable polar amplification factor 
throughout the Eocene. Because the obtained values are consistent with 
polar amplification derived from an analysis of the PETM event”? with 
better spatial resolution, this observation holds for both short (10°-yr) 
and long (multi-million-year) timescales. In the absence of pronounced 
snow and ice albedo feedbacks, the polar amplification factor should be 
determined by atmospheric feedbacks”. Therefore, the stable amplifi- 
cation factor implies that the strength of these feedbacks scales linearly 
with temperature in an ice-free world. 

Our temperature proxy compilations provide a concrete and robust 
test of the ability of models to reproduce past warm climates under 
increased greenhouse gas forcing. We performed fully coupled gen- 
eral circulation model simulations using the NCAR Community Earth 
System Model, version 1 (CESM 1), by applying a range of radiative 
forcings equivalent to a range of Eocene CO, concentrations—560 
parts per million (p.p.m.), 1,120 p.p.m., 2,240 p.p.m. and 4,480 p.p.m.; 
simulations EO1-EO4, respectively (see Methods)— run to full equilib- 
rium. Because the close correspondence between tropical, high-latitude 
and deep-sea temperature trends (Fig. 2) supports model-based infer- 
ences that Eocene global mean temperature was relatively insensitive 
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(2,240 p.p.m. CO), EO4 (4,480 p.p.m. CO2) and EO_CP”*; errors represent 
seasonal range. Yellow shadings illustrate age ranges to which the simulations 
are matched. b, Calculated MTG based on LOESS fits of proxy data (line, 
propagated 95% confidence intervals) and model simulations EO1-EO4 
and EO_CP (as in a). Age follows GTS2012. 


to variations in palaeogeography”'*”*, we did not vary the palaeogeo- 


graphic boundary conditions. The modelled deep waters derive primar- 
ily from polar surface waters’, justifying our use of the modelled and 
proxy-derived vertical gradient as an approximation for the MTG. The 
four simulations, EO1-EO4, were associated with specific age ranges by 
matching the simulated deep-ocean temperatures to the proxy-based 
deep-ocean temperatures, thus leaving SST as the predicted variable. 
Crucially, the simulations closely approximate the multi-proxy, multi- 
location tropical SST compilation for these four time slices (Fig. 3a). 
Therefore, the Eocene temperature gradients of 19-26°C, which are 
reconstructed from proxies, are also closely reproduced (Fig. 3b). 
This implies that current-generation climate models are capable of 
resolving the low-temperature-gradient problem" of Eocene green- 
house climates, provided sufficient greenhouse forcing, albeit with two 
important exceptions. First, regional proxy-model data mismatches for 
absolute temperatures in the South Pacific’? and Arctic?” oceans remain 
a conundrum, which this study does not resolve. Second, the model 
simulations do not fully reproduce the most reduced proxy-derived 
gradients of the early Eocene. On the basis of recent modelling experi- 
ments with tuned cloud parameters”®, one potential explanation could 
be that the early Eocene hothouse experienced different cloud behav- 
iour and shortwave radiative feedbacks (simulation EO_CP in Fig. 3b). 
Although a simulation with tuned clouds produces a more reduced 
early Eocene MTG at lower CO, concentrations, the same parameters 
lead to a poorer simulation of the MTG during the PETM”S (Extended 
Data Fig. 8), indicating that this remains an unresolved problem. 
With the overall excellent agreement between ocean temperature 
proxy reconstructions and model simulations, we can use the latter 
to estimate global mean temperatures, which are required to calculate 
climate sensitivity to CO, forcing. Global mean temperatures were 
about 29°C, 26°C, 23°C and 19°C during the early (54-49 Myr ago), 
early middle (48-46 Myr ago), late middle (42-41 Myr ago) and late 
Eocene (38-35 Myr ago), respectively, compared to a preindustrial 
temperature of 14.4°C. These may be slightly underestimated if South 
Pacific®!* and Arctic”’ temperature reconstructions represent accu- 
rate estimates of annually averaged SST. However, our model requires 
much larger changes in CO} to produce the large and dynamic range of 
Eocene tropical SST and deep-sea temperature than that reconstructed 
from proxy data®. This implies that the Earth system sensitivity’! to 
CO, doubling derived from the model (3.5 °C) is too low to create 
sufficient warmth. We consider available Eocene CO; reconstruc- 
tions® in combination with our proxy- and model-based temperatures 
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(Methods) to estimate the Earth system sensitivity at various parts 
of the Eocene (Extended Data Fig. 9). Our probabilistic analysis for 
the cooling between the early and late Eocene results in a calculated 
proxy-based Earth system sensitivity range of 0.9-2.3 K W~' m~? (68% 
highest density interval, equivalent to 3.5-8.9°C per CO, doubling), 
consistent with the high end of previous estimates''. 

The large range of Eocene tropical temperatures on both short and 
long timescales indicates that the tropics respond strongly to changes 
in greenhouse gases, even at high temperatures. In addition to high 
absolute temperatures of up to about 35°C and 37°C during the EECO 
and PETM”|, respectively, this refutes the notion of stable tropical tem- 
peratures‘, kept constant through a physical ‘thermostat’ mechanism”. 
Moreover, our results show that tropical SST varied in tandem with 
high-latitude and deep-ocean temperatures, with a stable Eocene polar 
amplification factor, consistent with a dominant role of CO; forcing 
in both long-term Eocene climate evolution and superimposed aber- 
rations including the PETM and MECO. Tropical temperatures are 
expected to rise in response to anthropogenic greenhouse gas emis- 
sions. Given the consistency between our climate simulations and 
reconstructions, current-generation fully coupled climate models are 
likely to perform adequately in predicting future tropical SST change, 
although accurate determination of the sensitivity of global climate to 
CO, change remains a major challenge. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


Palynology. Freeze-dried sediments (96 samples) were crushed and treated with 
30% HCl and twice with 38%-40% HF to remove carbonates and silicates, respec- 
tively, after a known amount of Lycopodium spores (batch number 1031; 20,848 
spores per tablet) was added to enable absolute quantification of palynomorphs. 
A 15-250 jum fraction was isolated using nylon mesh sieves and an ultrasonic 
bath. No oxidation procedure was applied. An aliquot of homogenized residue was 
mounted on slides and analysed using light microscopy (400 magnification) toa 
minimum of 200 identified dinocysts. 

Organic geochemistry. Lipids were extracted from freeze-dried and pow- 
dered sediments (5-25 g dry weight, 118 samples) with dichloromethane 
(DCM):methanol (MeOH) (9:1, v/v) using a Dionex accelerated solvent extractor 
(ASE 350) at a temperature of 100°C and a pressure of 7.6 x 10° Pa. Lipid extracts 
were separated into an apolar, ketone and polar fraction by Al,O3 column chro- 
matography using hexane:DCM (9:1), hexane:DCM (1:1) and DCM:MeOH (1:1) 
as respective eluents. 99 ng of a synthetic Cys (mass-to-charge ratio, m/z =744) 
glycerol dialkyl glycerol tetraether (GDGT) standard was added to the polar frac- 
tion, which subsequently was dissolved in hexane:isopropanol (99:1, v/v) to a con- 
centration of ~3 mg ml! and passed through a 0.45-\m polytetrafluoroethylene 
filter. This fraction was then analysed by high-performance liquid chromatogra- 
phy (HPLC) and atmospheric pressure chemical ionization-mass spectrometry 
using an Agilent 1260 Infinity series HPLC system coupled to an Agilent 6130 
single-quadrupole mass spectrometer at Utrecht University following Hopmans 
et al.*! to measure the abundance of GDGTs. The branched and isoprenoid tetra- 
ether (BIT) index and TEXgg values were calculated according to Hopmans et al.*? 
and Kim et al.!8, respectively. Based on long-term observation of the in-house 
standard, the analytical precision for TEXg¢ is +0.3 °C. 

GDGT distributions. Of the 118 samples analysed for GDGTs, 5 early Eocene 
samples did not yield sufficient concentrations of GDGTs to determine TEXg¢. 
Additionally, 4 samples were excluded because either GDGT-2 (2 samples) or cre- 
narchaeol (2 samples) could not be reliably identified. For the remaining 109 
samples we evaluated the sources of GDGTs and the reliability of TEXg.. The BIT 
index’, a means of quantifying the relative abundance of soil- and river-derived 
GDGTSs relative to marine GDGTs, is low throughout the entire Eocene (all <0.25, 
with 90% of values <0.07) and there is no significant correlation between BIT index 
and TEXg¢ (P > 0.3). Thus, our TEXg¢ values are probably not biased by terrestrial 
input. Both the methane index and GDGT-2/Cren ratio show normal marine 
values (<0.20 and <0.12, respectively), so there is no indication for high abun- 
dance of methanotrophic archaea relative to Thaumarchaeota***4. Furthermore, 
GDGT-0/Cren is low (<0.31), so there are no indications for enhanced contribu- 
tions of methanogenic archaea to the pool of isoprenoid GDGTs used in TEXg6°°. 
Finally, GDGT-2/GDGT-3 ratios are <4.5, ruling out substantial impact of 
deep-water production of GDGTs**. Together, these ratios indicate that GDGT 
distributions were probably not considerably affected by either GDGT-producing 
soil bacteria, methanotrophic or methanogenic archaea, or deep-dwelling 
Thaumarchaeota, thereby designating upper-water-column Thaumarchaeota as 
the main source and favouring the interpretation of TEX{; as an SST proxy”. 
Another recently described ratio focuses on the different GDGT distributions 
produced by modern Thaumarchaeota in the Red Sea’. Based on core-top datasets, 
fractional abundances of Red Sea GDGTs are known to differ from other oceanic 
settings, notably in containing relatively more crenarchaeol regio-isomer (Cren’)**. 
This causes a different relationship between TEXgg and SST. Inglis et al.° proposed 
the %GDGTks, [Cren’/(GDGT —0+Cren’)] x 100%, as a means of evaluating 
whether a ‘Red Sea-type’ GDGT distribution was present in the geological record. 
In our tropical Eocene record, TEXg¢ is strongly driven by fractional abundance 
of Cren’, therefore there is also a strong correlation between TEX¢¢ and %GDGTg. 
However, as Inglis et al.5 noted, this Red Sea GDGT distribution cannot be distin- 
guished from a high-temperature (>30°C) distribution, so %GDGT xs cannot 
disentangle the effects of high-temperature versus Red Sea-type GDGT distribu- 
tions at this site. However, we note that the several reasons that have been proposed 
for the aberrant Red Sea GDGT distribution are not likely to have played a role at 
Site 959. There is no environmental similarity between Eocene Site 959 and the 
modern Red Sea that could account for a similarly adapted population of endemic 
Thaumarchaeota, as the setting is not comparable oceanographically or geomor- 
phologically. Furthermore, our dinocyst record shows no indication of high salinity 
or strong stratification throughout the record. On this basis, we conclude that there 
is no reason to assume a similar relationship between TEXgg and SST for Site 959 
and the modern Red Sea. Finally, we note that our new equatorial record shows 
late Palaeocene TEX§, SST estimates of 31-33 °C, identical to time-equivalent SSTs 
derived from the 6'°O and Mg/Ca ratios and TEX {i of glassy preserved Morozovella 
acuta from nearby sections in Nigeria”’, confirming accurate proxy-estimated SSTs 
at the study site. 

TEXgg calibrations. Calibrations. Different calibrations have been proposed 
to translate TEX¢¢ into SST. Of note is also a recent paper by Ho and Laepple*’, 
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who propose that the sedimentary GDGTs derive from the deep ocean and TEX¢.5 
therefore reflects deep (>500 m) subsurface temperatures rather than SST. 
However, their conclusions are controversial, as their assumptions are inconsistent 
with all modern-ocean and microbiological evidence and the statistical method 
used is questionable*’. Within the TEX¢¢-to-SST calibrations, a first division can 
be made between calibrations based on core-top samples and those based on meso- 
cosm experiments. Here, we focus on applying different calibrations based on 
core-top datasets'*“!, as these implicitly include ecological, water-column and 
diagenetic effects that are not incorporated in mesocosm experiments. Several 
linear and nonlinear core-top calibrations have been developed. Of these, the global 
nonlinear (logarithmic) TEX 5. calibration of Kim et al.'8 and the BAYSPAR TEXg¢ 
calibration of Tierney and Tingley’! are particularly applicable and most com- 
monly chosen for higher-temperature settings, such as the Eocene. By treating 
TEXgg as the dependent variable, BAYSPAR is the only calibration that does not 
suffer from regression dilution bias. For these calibrations, the differences in abso- 
lute temperature and relative temperature change in studies reporting TEXg¢ values 
between 0.5 and 0.75 are mostly within the error of the proxy!*!"", Significant 
differences only appear with TEXg¢ values above those occurring in modern oceans 
(that is, TEXgs > 0.73) for which the TEXg¢-to-SST calibration has to be extrapo- 
lated. This is illustrated in Extended Data Fig. 2a, which shows that SST estimates 
based on the TEX, and BAYSPAR calibration for Site 959 are within error between 
TEXgg values of 0.67 and 0.80. However, the difference between the calibrations 
increases at higher TEXg¢ values. For assessing temperature change in a high tem- 
perature setting such as the equatorial Eocene, the choice of calibration therefore 
becomes an important factor. 

Biophysical considerations. For the modern ocean, a linear calibration results in a 
better statistical correspondence between TEXgg and SST in the temperature range 
of 5-30°C!®?. However, the question remains as to whether a linear calibration 
is the best choice for much warmer Eocene oceans considering the biochemical 
mechanism underlying the TEXg.-SST relationship. Hyperthermophilic archaea in 
culture synthesize an increasing proportion of GDGTs with an increasing number 
of cyclopentane moieties with increasing temperature***’, probably as a homeo- 
viscous adaptation of the cell membrane**. However, the GDGTs included in the 
TEXgg ratio (GDGT1-3 and the crenarchaeol isomer; see equation (1)) constitute a 
minor part of the membrane lipids of Thaumarchaeota. The dominant GDGTs are 
GDGT-0 and crenarchaeol*””. Indeed, in the global core-top dataset, higher cren- 
archaeol and lower GDGT-0 are recorded with higher temperatures, although their 
response is less strong than that of the GDGT isomers included in TEXg¢ (Fig. 4 
in Kim et al.!8). Thus, TEXg¢ does not capture the full membrane adaptation of 
Thaumarchaeota to changing temperatures. Interestingly, the ratio of crenarchaeol/ 
GDGT-0 versus TEX¢¢ shows a strongly nonlinear relationship in the global core- 
top data (Extended Data Fig. 2c). This trend is similar to that observed between 
the TEXge and the ring index (RI; Extended Data Fig. 2d), which is the average 
number of cyclopentane rings of GDGTs 0-3, crenarchaeol and its regio-isomer 
(see equation (2)) and which also shows a strong relationship to temperature”. 


[GDGT—2] + [GDGT-—3] + [Cren’] 


TEX g6= (1) 
[GDGT—1] + [GDGT—2] + [GDGT—3] + [Cren’] 
RI=0 x [GDGT—0] +1x [GDGT—1] +2 x [GDGT—2] 
+3 x [GDGT-—3] + 4x [Cren + Cren’] 


In Extended Data Fig. 2c, d the Eocene data from Site 959 overlap the core-top 
dataset, on both Red Sea and tropical latitude core-top data. This nonlinear 
relationship indicates that at high temperatures, TEXg¢ shows a relatively small 
response to temperature change relative to the amount of crenarchaeol versus 
GDGT-0 and RI. This suggests that with increasing temperatures, adaptation of 
the thaumarchaeotal membrane is increasingly regulated through crenarchaeol 
and GDGT-0 rather than the GDGTs included in TEXg¢. This should lead to a 
flattening of the slope between SST and TEXg¢ and therefore supports a loga- 
rithmic relationship. Additional support for this hypothesis comes from recent 
culturing experiments on three different Thaumarchaeota strains’. These show 
that for two strains, Nitrosopumilus maritimus and strain NAOA6, both TEXg¢ and 
RI (mainly driven by GDGT-0 and crenarchaeol) correlate with the incubation 
temperature. However, in the third strain (NAOA2), RI—but not TEXgs—changes 
with growth temperature. This third strain had the highest growth temperature 
optimum and the strongest change in RI from 28°C to 35°C. This suggests that 
at temperatures >28°C, membrane adaptation to temperature in certain (high- 
temperature) Thaumarchaeota may not be well reflected in the TEXgg ratio. It 
should be noted that no nonlinear response was found in mesocosm experiments” 
at temperatures of up to 40°C. However, this calibration is substantially different 
from that of the global core-top calibrations owing to the unusually low amounts 
of the crenarchaeol regio-isomer™. A similarly low abundance of the crenarchaeol 
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regio-isomer was noted for Nitrosopumilus maritimus and strain NAOA6™®. In 
strain NAOA2”, abundances of crenarchaeol regio-isomer were higher and did 
increase with temperature, suggesting that it may be a better representation for 
high-temperature-adapted marine Thaumarchaeota. On the basis of the above 
biophysical evidence, we argue that the slope of the TEXg.—to-temperature curve 
is likely to flatten at temperatures above the surface sediment dataset, such as the 
TEXgo values recorded in the Eocene of Site 959. 

Implications. The use of the nonlinear TEX gx calibration results in lower temper- 
ature estimates compared to the linear BAYSPAR calibration (Extended Data 
Fig. 2b) for Site 959 in the early Eocene and late Palaeocene. Notably, for the 
Palaeocene, such estimates agree better with SSTs derived from glassy preserved 
planktonic foraminiferal 6'8O and Mg/Ca records from nearby sections in 
Nigeria”®. Additionally, the Site 959 TEX, estimates fit well with the other SST 
constraints that we use in our tropical Eocene compilation (Extended Data Fig. 5). 
Finally, the similarly reduced sensitivity of TEXgg (that is, nonlinearity) at the low 
end of the temperature range is undisputed because it is apparent in both meso- 
cosm experiments” and in the global core-top dataset!**?. We therefore apply the 
TEXi! calibration in our main analysis, which is presented in the main text. 
Nevertheless, the absolute temperature estimates and magnitude of change 
obtained from the extrapolated part of the TEXg¢ calibration curve should always 
be interpreted with care. For completeness, we also present the results for the MTG 
and polar amplification analysis using BAYSPAR in Extended Data Fig. 3. This 
confirms that the use of TEX,, instead of BAYSPAR gives a conservative estimate 
of middle late Eocene cooling and MECO warming at Site 959 and thus a low 
estimate of (early Eocene) MTGs and a maximum estimate of polar amplification 
compared to BAYSPAR. Crucially, however, the choice of calibration does not affect 
the trends in tropical surface temperatures (Extended Data Fig. 3a) or the fact that 
they parallel deep-ocean temperatures, and therefore does not affect our conclusion 
regarding the drivers of Eocene climate change. The larger Eocene range of trop- 
ical temperatures reconstructed using BAYSPAR does imply that SSTs at Site 959 
varied more than deep-sea temperatures during the Eocene, suggesting tropical 
rather than polar amplification. Regardless, the relation between tropical and deep- 
sea temperatures remains linear, reflecting a constant polar amplification factor 
(Extended Data Fig. 3c). 

Age model Site 959. Eocene sediments from Site 959 were too weakly magnetized 
to yield reliable palaeomagnetic results'®. Our age model is therefore based on a 
combination of bio- and chemostratigraphy and supported by cyclic variations in 
sediment coloration (Extended Data Fig. 1). Although dinoflagellate cyst assem- 
blages support the Eocene age of the analysed material, they do not yield many 
biostratigraphic events with a well calibrated age in the tropics™. A total number 
of 76 additional standard smear slides (Supplementary Information) were analysed 
for calcareous nannofossils and enabled the improvement of the initial biostrati- 
graphic framework”. Biochronological estimates from the low-latitude nannofossil 
biozonation”® were converted to GTS2012°” using the relative position of each 
biohorizon within the respective magnetochron. In total, 10 robust nannofossil 
tie-points were used (Extended Data Table 1). The base and top of Chiasmolithus 
gigas could not be used at this site owing to the extremely low abundance of this 
species. Therefore, alternative biohorizons in the evolutionary lineage Sphenolithus 
furcatolithoides morph. A-Sphenolithus cuniculus-S. furcatolithoides morph. B were 
used. On the basis of the co-occurrence of two non-synchronous bio-events (base 
Nannotetrina alata gr. and base Nannotetrina cristata) at the same depth (between 
740.95 and 741.63 metres below sea floor (mbsf)) and supported by a sudden shift 
in the nannofossil assemblage, the presence of a hiatus was inferred in Core 35R at 
~741 mbsf. The presence of Nannotetrina alata sensu strico in combination with 
Sphenolithus perpendicularis and transitional forms of sphenoliths at 740.95 mbsf 
suggests that the sediments just above the hiatus are very close in age to the actual 
base of the N. alata group. Therefore, we also include this biohorizon in our age 
model. The lower boundary of the hiatus is based on linear extrapolation of the 
underlying sedimentation rate of 1.27 cm kyr!. This approach results in a hiatus 
of 1.5 Myr (48.0-46.5 Myr ago). To further constrain the age model, several che- 
mostratigraphic tie-points were used. The onset of the carbon isotope excursion 
marking the Palaeocene—-Eocene thermal maximum (~56 Myr ago) was recently 
identified! at 804.09 mbsf. In addition, the previously identified late Eocene 
minimum in osmium isotope ratios** (!*7Os/1®8Os) at 458.65 mbsf has an age of 
34.4 Myr ago in GTS2012 on the basis of the correlation to the Os isotope record 
at the well dated ODP sites 1218 and 1219°°. These age constraints indicate that 
our data span the entire Eocene. The age model is further supported by calculated 
sedimentation rates from selected intervals, where high-resolution colour logs 
showed more than four easily distinguishable cycles. Sedimentation rates were 
calculated by assuming that these smallest-scale alternations are precession-forced, 
and were thus assigned a duration of 21 kyr per cycle. These sedimentation 
rates (blue lines in Extended Data Fig. 1) correspond closely to those based on 
chemostratigraphy and biostratigraphy. Our age model implies that the warming 
interval 590-565 mbsf reflects the MECO, which is further supported by a shift in 


osmium isotope ratios that was also identified within the MECO at sites 1263 and 
U1333. Owing to a lack of nannofossils in the poorly recovered upper part of 
Hole 959D, linear extrapolation was used for the data points below 466 mbsf. This 
places the Eocene-Oligocene boundary at 447.5 mbsf, which is in good agreement 
with the placing of the Oil glacial event on the basis of osmium isotope recovery 
after the minimum of 34.4 Myr ago by Ravizza and Paquay™. 

Age models other sites. ODP Site 1172. The TEXg¢-based SST record from Site 
1172341 is plotted (Fig. 2) on an age model based on the magnetostratigraphy 
of Bij] et al.®°, which is in turn largely grounded on the original interpretation by 
Fuller and Touchard™. This was supplemented with three well-calibrated dinocyst 
events from Dallanave et al.® (top and base Charlesdowniea edwardsii and top 
Wilsonidium ornatum) instead of the uncertain magnetochron reversals for this 
interval (552-578 mbsf). 

Dahomey Basin, Nigeria. For the Sagamu Quarry and IB10B Core, Nigeria, 
published biostratigraphic and chemostratigraphic age constraints” were used. 
Specifically, base Morozovella subbotinae and base Acarinina soldadoensis were 
used for the Sagamu Quarry (SQ) and base Acarinina soldadoensis, top Morozovella 
acuta, carbon isotope excursion (CIE) onset and top CIE recovery were used for 
IB10B as age-depth tie-points. 

Tropical SST compilation. For the presented compilation, we integrate the new 
ODP Site 959 TEXg¢-based SST record with several existing SST proxy records, 
specifically 5'8O of photosymbiont-bearing planktonic foraminifera Morozovella 
spp. (upper mixed layer) and Acarinina spp. (mixed layer) from the SQ, Nigeria”? 
and TDP sections*® and near-surface dwelling Turborotalia ampliapertura from 
TDP®; Mg/Ca of Morozovella spp. from ODP Site 865° and SQ”, Acarinina spp. 
from SQ” and T: ampliapertura from TDP®; TEXg¢ from ODP Site 92574”, Site 
929°*4, TDP4, SQ and the IB10B Core, Nigeria”; and clumped isotope (A47) ther- 
mometry of shallow-dwelling large benthic foraminifera from Evans et al.° 
(Supplementary Information). We did not include data from South Dover Bridge® 
and Walvis Ridge”! because plate tectonic reconstructions place them outside the 
30° N-30° S latitude band. Age models for all sites were converted to GTS2012 
using published age-depth tie-points. For the Mg/Ca proxy, (normalized) Mg/Ca 
compositions were converted to SST using the calibration from Anand et al.”” and 
the Eocene seawater Mg/Ca reconstruction from Evans et al.°, and using”? H =0.42 
to correct for the power-law dependence of test Mg/Ca values on changing seawa- 
ter Mg/Ca ratios”*. Conversion of §!8O to temperature was done following Erez 
and Luz”, assuming a constant ice-free global 8!8O,y of —1.2%0o VPDB” (Vienna 
Pee Dee belemnite) and (constant) latitudinal corrections for TDP and SQ of 
0.83%o and 0.61%o, respectively’”. A +2°C correction to convert reconstructed 
T. ampliapertura temperatures to SST (as used in the original publication) was 
omitted here. It should be noted that different seawater chemistry assumptions for 
the 6'8O and Mg/Ca proxies may result in shifts in reconstructed temperatures, 
but do not qualitatively change trends or the correspondence between trends. 
Multiple measured specimens per sample in the original studies have been averaged 
into one value for this compilation. For the TEXge-based records, samples with 
aberrant GDGT ratios were removed following Inglis et al.°. The logarithmic TEX {i 
calibration of Kim et al.!* is presented in the main text and a full supplementary 
analysis using the linear BAYSPAR” calibration is provided. For the BAYSPAR 
calibration, the default search tolerance (2 standard deviations of the raw TEXg6 
dataset) was used for Site 959, 925 and 929, which yields a representative set of 
low-latitude calibration localities. The search tolerance was stretched to 0.15 TEX 
units for the Dahomey Basin and TDP records, in order to not only sample the 
possibly anomalous modern Red Sea**, but also include a broad representative 
sample of low-latitude localities. 

Global deep-ocean temperature compilation. We base our deep-ocean temper- 
ature compilation on the benthic isotope stack previously compiled by Zachos 
et al. and supplement this with several high-resolution benthic isotope records, 
specifically from ODP Site 6907*”?, ODP Site 748”, ODP Site 1218*°, ODP Site 
1209*!, ODP Site 1258°", ODP Site 1262°? and ODP Site 1263*4. After the respec- 
tive species-specific corrections for disequilibrium vital effects*° were applied, 
5'8O-to-temperature conversion was done following Erez and Luz’®, assuming a 
constant ice-free global §'8O,,, of —1.2%o VPDB’®. Age models for all sites were 
converted to GTS2012 using published age—depth tie-points. 

CO, compilation. The compiled CO; record plotted in Fig. 2 derives from boron 
isotopes from TDP®*, with 68% confidence intervals, as reported in Foster et al.°”, 
and alkenones from ODP Site 612° and Site 925”°. 

Meridional temperature gradients. As an approximation for the pole-to-Equator 
temperature difference or MTG, we calculate the difference between tropical 
mean SST and deep-ocean temperatures. The latter are better constrained than 
high-latitude SSTs and exclude potential summer temperature biases that might 
plague available high-latitude SST records. We use deep-ocean temperatures based 
on §!80, as these are better constrained than those based on Mg/Ca, particularly 
because of the large uncertainties regarding seawater Mg/Ca values as well as 
larger uncertainty between different calibrations and corrections used to convert 
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Mg/Ca to temperature. We note that our approach of using the LOESS-fitted data 
provides robust estimates of long-term changes in MTG, but is less appropriate for 
considering transient events, as fitted event MTGs (for example, for the PETM and 
MECO) are very dependent on the bandwidth of the fit and the specific records 
used (Extended Data Fig. 8). 

Sensitivity to late Eocene ice volume. The Cenozoic benthic foraminiferal 6'8O 
signal reflects both deep-water temperature and global ice volume changes. 
Although it is unlikely that large Antarctic ice sheets were present in the warm- 
est interval of the Cenozoic during the early Eocene, the extent of middle late 
Eocene Antarctic glaciation is more uncertain (see, for example, Miller et al.®°, 
Barker et al.”’ and Gasson et al.”!). Recent work argues for possible early middle 
Eocene glaciation” but the dating of these sediments is highly uncertain. Although 
there might be evidence for glacial activity, the interpreted presence of large East 
Antarctic ice sheets in the early middle Eocene is highly controversial, certainly 
in light of very warm temperatures on the East Antarctic margin”®. Nevertheless, 
initial small ice caps in the middle late Eocene would have had relatively enriched 
isotopic compositions of —20%o to —35%o VSMOW” (Vienna Standard Mean 
Ocean Water) relative to mean modern Antarctic ice (—54%o VSMOW). We assess 
the effect of a range of middle late Eocene ice volumes with different isotopic 
compositions on the mean 8'°O of Eocene seawater (Extended Data Fig. 6d). This 
demonstrates that the effect of ice volume was probably not more than ~0.25%o, 
or ~1°C, in the latest Eocene. To further illustrate this, we present both the record 
of ice-free deep-ocean temperature evolution and a second line based on a linear 
build-up of late Eocene ice volume from 39.5 Myr ago (post- MECO) onwards to 
a latest-Eocene (34.0 Myr ago) maximum of 10’ km? (refs’>°°) with an isotopic 
composition of —25%o VSMOW™ (Extended Data Fig. 6a). This makes a maxi- 
mum difference of about 0.8 °C (A&O,y of 0.18%0) in the latest Eocene. We fur- 
ther propagate this uncertainty into the analysis of MTGs and polar amplification 
factors (Extended Data Fig. 6b, c). 

CESM 1 model simulations. The CESM 1 simulations share the same generalized 
Eocene palaeogeography to assess the effect of changing CO; by itself, and were 
all run for more than 3,000 yr to equilibrium. Simulations using an earlier, and 
generally similar, version of this model were found to produce the best match to 
early Eocene proxy temperatures within a multi-model ensemble", and prelimi- 
nary comparison revealed that these new simulations are slightly improved over 
the earlier version for the early Eocene”’. Results from the lower-CO) simulations 
(560 p.p.m. and 1,120 p.p.m.) and further information on the model can be found 
in Goldner et al.”°. This version of CESM has a modern ‘fast’ climate sensitivity of 
2.9°C per CO2 doubling” and a nearly constant ‘slow’ climate sensitivity (ESS) of 
3.5°C per CO) doubling in the Eocene simulations used here. For comparison with 
the proxies in this study, the four simulations with varying CO) were assigned spe- 
cific ages by matching the simulated deep-ocean temperatures to the proxy-based 
deep-ocean temperature reconstruction curve. We then compared the resulting 
SSTs at the same localities as the main sites in our proxy compilation (ODP sites 
865, 925/929 and 959 and TDP) and surface-to-deep gradients to evaluate model 
performance (Supplementary Information). The temperature at the proxy data 
localities was sampled in a 4° radius. This approach avoids the circularity of adjust- 
ing the climate model radiative forcing to match surface temperature records and 
provides a target that circumvents the uncertainty introduced by the various errors 
and uncertainties in surface temperature records. In these simulations, bottom 
water temperatures in the 4,480 p.p.m. scenario (simulation EO4) are represent- 
ative of a hot early Eocene climatic optimum extreme (deep-ocean temperature 
of 13-14°C, following the conventions of Huber and Caballero!*) whereas the 
560 p.p.m. scenario (simulation EO1) is comparable to the latest Eocene (deep- 
ocean temperature of 4-5 °C), with intermediate simulations (EO2 and EO3) being 
in between and comparable to the middle Eocene. 

Polar amplification factor calculations. First, to obtain an estimate of the factor 
by which polar temperature change is amplified relative to the tropics (that is, the 
polar amplification factor), we performed a Deming regression of the Site 959 
record against the deep-sea stack of temperatures, accounting for errors in both 
variables. Data were binned into 1-Myr bins from 34 to 58 Myr ago. We did not 
include data from the EOT and earliest Oligocene, to exclude major effects of ice 
volume changes on seawater 60. To assess the robustness of the single regression, 
we followed a probabilistic approach, using Monte Carlo resampling with full prop- 
agation of errors. First, we generated 1,000 iterations of both the tropical SST and 
deep-ocean temperature datasets. In these iterations, each data point was resam- 
pled within the 95% confidence limits of its propagated analytical plus calibration 
uncertainty, assuming Gaussian distribution of errors. Using these, we performed 
1,000 iterations of a Deming regression of deep-ocean temperature against tropi- 
cal SST, with data binned into 1-Myr bins from 34 to 58 Myr ago and propagated 
errors related to the binning used in the regression. We plotted the resulting suite 
of 1,000 slopes as a probability density function of the polar amplification factor. 
This exercise was performed using the full tropical temperature compilation and 
the single Site 959 record. We additionally performed a Deming regression of 
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the Site 959 record against the high-latitude Site 1172 record as a supplementary 
analysis. The latter analysis generates a similar polar amplification factor, but with 
larger scatter and uncertainty. This is due to the smaller amount of data points in 
the Site 1172 SST record relative to the benthic 5!8O stack, as well as differences 
in the detailed pattern of Eocene cooling between Site 1172 compared to Site 959 
and the deep-ocean temperature record. 

ESS calculations. To provide estimates of Eocene ESS sensu Lunt et al.°8, we com- 
bined our proxy and model reconstructions of temperature with the few available 
CO; reconstructions based on boron isotopes®, involving 1 sample for the early 
(54-49 Myr ago), 2 samples for the middle (48-42 Myr ago) and 1 sample for the 
late (38-35 Myr ago) Eocene. We derived temperatures by sampling the proxy 
compilation within the designated age brackets. We use tropical and deep-ocean 
temperature change (dT) as minimum and maximum estimates of dT. Between 
these, a uniform ‘flat’ probability distribution was assumed. We converted changes 
in boron-based COQ; estimates to radiative forcing in W m ~~ using the radiative 
forcing fit from Byrne and Goldblatt”. With the above approach, we derived esti- 
mates of ESS in K W~! m? for the early Eocene compared to the middle and late 
Eocene and to the preindustrial temperature. Uncertainties are based on propa- 
gated uncertainties of temperature change and radiative forcing derived by resa- 
mpling these datasets 1,000 times within their 95% confidence limits (propagated 
analytical plus calibration uncertainty for temperature, reported 95% confidence 
limits from the original work? for CO,). In this, we removed radiative forcings <0, 
that is, we assumed that there is no negative forcing associated with increasing CO. 
Given the good match between proxies and the presented model simulations, we 
also calculated ESS using the model-derived global mean temperatures and CO, 
proxy data. 

Data availability. The data supporting the findings of this study are available 
within the paper and its Supplementary Information. Original raw data (palynol- 
ogy counts and GDGT concentrations and chromatograms) are available from the 
corresponding author upon request. 

Code availability. The model used in this study is NCAR CESM 1 with CAM4 
atmosphere, which is freely available from NCAR (http://www.cesm.ucar.edu/ 
models/cesm1.0/). 
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Extended Data Fig. 1 | Augmented age model of Hole 959D. Age-depth 
plot showing calcareous nannofossil and chemostratigraphic tie-points 
(diamonds; vertical error bars are indicate the minimum and maximum 
depth of the tie-point), as presented in Extended Data Table 1b. B, BC and 


T stand for base, base common and top, respectively. Blue-shaded regions 
represent depth intervals for which sedimentation rates (blue lines) were 
calculated. The hiatus of ~1.5 Myr in Core 35 is indicated as a red curly 
line. Epochs and ages are expressed in Myr ago (Ma), following GTS2012. 
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Extended Data Fig. 2 | Comparison between different TEXg¢-to-SST 
calibrations and different GDGT ratios. a, TEXs¢—SST calibration lines 
(trend lines for BAYSPAR) for one logarithmic and several linear 
calibrations. Plotted symbols are the Site 959 TEXg¢ record, to illustrate 
which part of the calibration is relevant for this study. Compared 
calibrations are: BAYSPAR'’**! with default settings (search tolerance for 2 
TEXgo standard deviations, 0.13; dark-grey line, dark-grey diamonds), 
BAYSPAR with increased search tolerance (0.2) (dashed line, light-grey 
diamonds), Kim et al.!® logarithmic TEX ss core-top calibration (red line, 
red diamonds), linear core-top calibration!® (light-blue line) and linear 
subset core-top calibration without Red Sea and polar ocean data’® (dark- 
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blue line). It is of note that the logarithmic TEX {i starts strongly diverging 
from the linear BAYSPAR and subset calibrations from TEXg¢ values of 
>0.8. b, Site 959 SST record using different TEXg, calibrations. 
Calibrations and line colours and types are as in a. c, Ratio of crenarchaeol 
to GDGT-0 against TEXgg. Data are from a core-top compilation"! (black 
circles; Red Sea subset, purple circles) and our Site 959 record (red 
squares). d, Ring index sensu Zhang et al.*° against TEXg¢. Data are from a 
core-top compilation (black circles; Red Sea subset, purple circles) and our 
Site 959 record (red squares). The exponential regression line of Zhang 

et al. through the core-top data is plotted as a black line. 
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Extended Data Fig. 4 | Regression analysis between reconstructed SST interval shown as brown shading; R* = 0.00, P= 0.75) and a better fit 


and abundance of upwelling indicators. a, TEX{i-based SST (red (R? = 0.35) that is significant (P < 0.01) when only the late Eocene (post- 
diamonds, upper left vertical axis), protoperidinioid abundance MECO) part of the record is considered (blue-grey line; 90% confidence 
(percentage of total dinocyst assemblage; brown dots, right vertical axis) interval is shown as blue-grey shading). c, Regression analysis between 
and TOC (percentage of sediment; black dots, lower left vertical axis) SST and percentage of TOC in sediment, showing a significant negative 
records of ODP Site 959. Dashed lines represent a hiatus in Site 959. Age is _ correlation for the whole record (R* = 0.39, P< 0.001; dark-grey line, with 
in GTS2012. b, Regression analysis between SST and percentage of the 90% confidence interval shown as dark-grey shading) and the late 
protoperidinioid dinocysts of total dinocyst assemblage, showing a non- Eocene subset (R? = 0.37, P< 0.01; blue-grey line, with the 90% 
significant relationship with a very low fit (brown line, 90% confidence confidence interval shown as blue-grey shading). 
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Compilation presented in Fig. 2, here plotted per site and proxy, with data 
sources in the key. The abbreviations Moro., Aca., and Turbo. stand for 
foraminifera genera Morozovella, Acarinina and Turborotalia, respectively. 
The dashed line in the Site 959 record represents a hiatus. Conservative 
estimates of propagated calibration and analytical errors (1 s.d.) are 


uncertainties are as reported in the original study®, with the minimum and 
maximum per-sample uncertainty. Uncertainties are plotted on the same 
relative vertical temperature scale as the data to facilitate comparison. The 
age is in Myr ago, following GTS2012. 
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Extended Data Fig. 6 | Sensitivity of main results to late Eocene ice 
volume. a, Top, tropical SST compilation; proxy data are compiled as 
described in Methods (red symbols). The fitted LOESS model is plotted 

as a black line and the 95% confidence interval as grey shading. Bottom, 
deep-ocean temperature compilation; 5'8O-based proxy data are compiled 
as described in Methods. Ice-free deep-ocean temperatures and fitted 
LOESS model are shown as grey dots and line, respectively, and the deep- 
ocean temperature compilation and fitted LOESS model including late 
Eocene ice volume effect (Methods) as blue dots and line, correspondingly. 
95% LOESS confidence intervals are shown as shading. b, Calculated 
MTG based on LOESS fits of proxy data (lines; propagated 95% confidence 
intervals are shown as silhouettes). The black line with grey silhouette 
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shows results obtained using ice-free deep-ocean temperatures, and 

the blue line with blue silhouette includes the late Eocene ice volume 
effect on the deep-ocean temperature. c, Proxy (blue diamonds, tropical 
compilation; red diamonds, Site 959) deep-ocean temperature, including 
the late Eocene ice volume effect, against tropical SST. Lines represent 
Deming regression analysis through proxy data. The slope (polar 
amplification factor) is 2.07 + 0.25 (+1 standard error) for the tropical 
compilation and 1.19 + 0.06 for Site 959). Proxy data grouped into 1-Myr 
bins from 34-58 Myr ago, with error bars representing one standard 
deviation due to binning. d, Sensitivity of §'8O of Eocene seawater 

(%o VSMOW) to the build-up of 0-10’ km? of ice with varying isotopic 
composition. 
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Extended Data Fig. 7 | Linear relationship between high-latitude and 
tropical SST. Site 1172 TEXg¢-based SST (record plotted in Fig. 2) against 
Site 959 TEXg.-based SST. Lines represent Deming regression analysis 
through proxy data (polar amplification factor, 1.66 + 0.57). Proxy data are 
grouped into 1-Myr bins from 34-58 Myr ago, with error bars representing 
one standard deviation due to binning. Peak PETM and peak MECO SSTs 
are plotted as separate points, which fall within the uncertainty of the 
regression line. 
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Extended Data Fig. 8 | PETM temperature gradient proxy-model the PETM simulation of Kiehl and Shields”* (black open squares, seasonal 
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small bandwidth (0.25 times the GCV-optimized span) that tracks deep- over the course of the event. Nevertheless, peak PETM MTG matches the 


ocean PETM temperature more closely is shown as the blue line, with the simulation CP_PETM poorly. The age is in Myr ago, following GTS2012. 
95% confidence interval as blue shading. Data are plotted together with 
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Extended Data Fig. 9 | Probability distributions of Eocene Earth system 
sensitivity. a, b, ESS estimates using proxy (a) and model (b) temperatures 
in combination with proxy-based CO) concentrations, derived as 
described in Methods. Eocene ESS is separated into the late Eocene relative 
to the EECO (red), the middle Eocene relative to the EECO (purple) and 
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the middle Eocene relative to late Eocene (blue). ESS estimates of the 
EECO relative to preindustrial temperature (black) have lower error owing 
to the high precision of preindustrial CO, concentration and temperature, 
but include additional long-term non-CO, effects. 
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Extended Data Table 1 | Palaeolatitude and age constraints of Site 959 over the Eocene 


a 


Palaeolatitude at 


Matthews et a/. 2016 


Torsvik et al. 2012 


0.982 °N 9.826 °S 
0.779 °N 9.480 °S 
0.581 °N 9.130 °S 
0.587 °N 7.405 °S 
0.600 ‘N 5.674 °S 
0.942 °N 4.568 °S 
1.291 °N 3.462 °S 
b 


Latest Eocene Os 
isotope minimum 


MECO Os isotope 
minimum 


Base common 


Base 


Bot hiatus 


Top 


Base 


Top 


Top 


Onset PETM 
isotope excursion 


Species / Proxy 


87 Q5/188Os 


187 Q5/'88Os 


Reticulofenestra 
umbilicus 


Sphenolithus 
furcatolithoides morph. B 


Sphenolithus cuniculus 


Sphenolithus 


furcatolithoides morph. A 


Nannotetrina alata group 


Discoaster lodoensis 


Discoaster sublodoensis 


Tribrachiatus orthostylus 


Tribrachiatus contortus 


35°C 


Discoaster multiradiatus 


Minimum 
depth (mbsf) 


671.56 


701.38 


731.73 


740.95 


740.95 


746.73 


755.06 


774.49 


784.05 


821.92 


Mean depth 
(mbsf) 


One- 
sided 
error (m) 


Age 
GTS2012 


Reference 


Ravizza and Paquay 
2008(°*) 


van der Ploeg et al. in 
press(°°) 


Shafik et al. 1998(°°); this 
study 


This study 


Shafik et al. 1998; this 
study 


This study 


Shafik et al. 1998; this 
study 


This study 


This study 


Shafik et a/. 1998; this 
study 


This study 


Shafik et al. 1998 
Frieling et al. 2018(") 


Shafik et al. 1998 


a, Palaeolatitudes reconstructed with GPlates using the hotspot reference frame of Matthews et al.2° and the palaeomagnetic reference of Torsvik et al.!°°. Present latitude is 3.6276° N and longitude is 
2.7352° W. b, Bio- and chemostratigraphic age-depth tiepoints (from refs 21:55:58.0 and this work) used in developing the age model for the Eocene of Site 959. 
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Global surface warming enhanced by weak Atlantic 


overturning circulation 
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Evidence from palaeoclimatology suggests that abrupt Northern 
Hemisphere cold events are linked to weakening of the Atlantic 
Meridional Overturning Circulation (AMOC)', potentially by 
excess inputs of fresh water”. But these insights—often derived 
from model runs under preindustrial conditions—may not apply 
to the modern era with our rapid emissions of greenhouse gases. 
If they do, then a weakened AMOC, as in 1975-1998, should have 
led to Northern Hemisphere cooling. Here we show that, instead, 
the AMOC minimum was a period of rapid surface warming. 
More generally, in the presence of greenhouse-gas heating, the 
AMOC?’s dominant role changed from transporting surface heat 
northwards, warming Europe and North America, to storing heat 
in the deeper Atlantic, buffering surface warming for the planet as 
a whole. During an accelerating phase from the mid-1990s to the 
early 2000s, the AMOC stored about half of excess heat globally, 
contributing to the global-warming slowdown. By contrast, since 
mooring observations began? in 2004, the AMOC and oceanic heat 
uptake have weakened. Our results, based on several independent 
indices, show that AMOC changes since the 1940s are best explained 
by multidecadal variability®, rather than an anthropogenically 
forced trend. Leading indicators in the subpolar North Atlantic 
today suggest that the current AMOC decline is ending. We expect 
a prolonged AMOC minimum, probably lasting about two decades. 
If prior patterns hold, the resulting low levels of oceanic heat uptake 
will manifest as a period of rapid global surface warming. 

As an analogy of the flow of energy in our climate system, consider 
the filling of a bucket of water from a tap at the top. The feed rate of the 
tap is an analogue of the top-of-atmosphere radiative imbalance—the 
net heating—of our planet, with the water level in the bucket analogous 
to surface warming. The sink at the bucket bottom drains into a larger 
bucket below (the deeper oceans). If the drain rate is the same as the 
feed rate from the tap at the top, the water level in the bucket does not 
rise (hiatus of surface warming). If the drain is plugged, the water level 
will rise rapidly in the bucket (rapid surface warming). AMOC controls 
about half of the variation of this ‘drain rate. 

Figure 1 quantifies the energy budget of our climate system, using the 
subsurface ocean heat content (OHC) measured mostly by a system of 
autonomous profiling Argo floats, during a period, 2000-2014, when 
the ‘drain rate’ was large. The total OHC, as approximated by that in the 
upper 1,500 m of the oceans, is increasing at a rate of about 0.42 + 0.02 
W m ~’, consistent with radiative imbalance’. The upper 200 m roughly 
corresponds to the mixed layer globally. Through wind and turbulent 
mixing, variations of sea surface temperature (SST) and mixed-layer 
OHC are highly statistically correlated (r=0.82 in 13-month running 
mean). Figure 1 shows that both were in a warming slowdown for this 
period. Why the upper 200 m OHC was in a warming slowdown is 
clear: the increase in heat storage below 200 m, about 89 zettajoules 
(1 ZJ =107'J). This amount of heat is equivalent to 180 years of the 
world’s energy consumption at the current rate, and any future variation 
even within this observed range will have important consequences for 
the surface temperature. 


If the radiative imbalance and the heat storage below 200 m were to 
remain the same, the 0-1,500 m OHC would still increase at the same 
rate as the radiative imbalance, but the 0-200 m OHC curve would lie 
on the 0-1,500 m curve, increasing at the same rate, or about 0.23 °C 
per decade. Our best estimate for the next two decades, allowing for 
some increase in ocean storage, is 70% of that rate, at 0.16 °C per decade 
(see Methods), close to the 25-year trend of 0.177 °C per decade of the 
last rapid warming period in the twentieth century’. 

The inset of Fig. 1 shows how the global increase in OHC storage 
between 200 m and 1,500 m are partitioned among the various oceans. 
The Pacific and the Indian oceans dominate the horizontal exchanges 
of heat in the upper 300 m®!°, and the Atlantic and the Southern oceans 
dominate the vertical redistribution'!. They accounted for about 70% of 
the global heat storage increase in the 200-1,500 m layer during 2000- 
2014, divided between the North Atlantic, which is dominant before 
2005, and the Southern Ocean after 2005. The subsurface warming in 
the Southern Ocean started in 1993 according to the data available (see 
below), and was attributed to the southward displacement and inten- 
sification of the circumpolar jet’, caused in large part by the Antarctic 
ozone hole’’. The North Atlantic’s role appears to be cyclic on decadal 
timescales, with AMOC in an accelerating phase before 2005. 

AMOC transports warm saline surface water found in the subtropi- 
cal Atlantic to the subpolar Atlantic, where heat loss to the cold atmos- 
phere increases its density. Aided by its high salinity it sinks and returns 
southward at depth. When AMOC is stronger (weaker), more (less) 
of the warm and saline water is found in the subpolar Atlantic, and 
subsequent sinking subducts more (less) heat there, as demonstrated 
in Fig. 2. The contrast is dramatic between periods when AMOC is 
increasing and when it is decreasing. Why AMOC sometimes acceler- 
ates or declines is more complicated. It could be responding to external 
forcing, for example, such as the freshening of the subpolar waters from 
melting ice at the end of the Little Ice Age'?. Or, AMOC could be part 
of a natural, multidecadal variability involving feedbacks between the 
density effect of salinity on deep convection in Labrador and the Nordic 
Seas, and the subsequent induced northward transport of surface salin- 
ity reinforcing the deep convection". 

AMOC is commonly believed to be slowing on centennial times- 
cales owing to global warming. The RAPID/MOCHA mooring array, 
deployed in 2004? off the coast of Florida to monitor AMOC, soon 
afterwards recorded its weakening’. The decadal decline, however, is 
ten times larger than the predicted forced response®, causing concerns 
about its long-term trend and possible deficiencies of the models used. 
Figure 3a, constructed from various independent proxies from 1945 
to the present (see Extended Data Fig. 1 for unfiltered time series and 
Extended Data Fig. 2 for error bars), shows that it is dominated instead 
by reversing phases. The weakening AMOC, by 3.7 Sverdrups (Sv) since 
2005 measured by the RAPID/MOCHA array, was actually preceded 
by an acceleration’>'*. Altimetry data of sea-surface heights (SSH) 
available since 1993'7 were used to deduce! via geostrophic balance 
that at 41° N AMOC sped up by 4 Sv from the early 1990s to 2005, 
consistent with Zhang’s subsurface fingerprint proxy®. We use multiple 
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Fig. 1 | Quantifying the global heat budget and the partition among 
ocean basins in the two periods 2000-2004 and 2005-2014. The SST 
from ERSST.v4 is shown as a black curve and the 0—200-m OHC from 

the ISHII and Scripps datasets (see Methods) is shown as an orange curve, 
showing that they co-vary and that both are in a warming slowdown, 
while the total OHC, as approximated by the 0—1,500-m OHC (red 
curve), is increasing at the regressed linear rate of 0.42 W m * (red dashed 
straight line). This excess heat from forcing is sequestered below 200 m. 
The orange-shaded region represents the additional amount of heat 
stored in the 200—1,500 m layer since 2000, about 89 ZJ. One zettajoule 


independent proxies to infer subpolar AMOC strength back in time to 
1945. Many of the proxies used here have been validated by models: 
Zhang's subsurface temperature fingerprint was highly coherent with 
AMOC strength*!?” at low frequencies in the model (GFDL CM2.1) at 
mid-latitudes. The subpolar gyre SST proxy”!, and the upper ocean sub- 
polar salinity proxy”? were also model-validated. Along with the long 
record of tide gauges along the east coast of the USA”’, these proxies 


Depth (m) 


is equivalent to twice the world’s annual energy consumption. If this 
additional storage were absent, the upper 200 m would have increased 

at the rapid rate of the red curve. We adjusted the data for the Southern 
Ocean to remove a possible artefact due to the rapid transition from no- 
Argo to the Argo observing platform around 2002—2003*%. The inset 
shows the division of the 89 ZJ of global ocean increase in heat storage in 
the 200—1,500 m layer into the four ocean basins and two periods. 35° S 
marks the northern boundary of the Southern Ocean and the southern 
boundary of the Atlantic, Pacific and Indian oceans. The error bars are 
one-standard-deviation errors of the linear regression. 


consistently indicate a period of low AMOC from the mid-1970s to 
the 1990s. The shading in Fig. 3 shows that this period coincided with 
a period of rapid surface warming. See also Extended Data Fig. 3 for 
the coincidence of Atlantic OHC change and global surface warming. 
See Methods for model-observation reconciliation. 

We call AMOC+ (AMOC-) the phase when the AMOC strength is 
above (below) climatology (based on the subpolar salinity, which has 
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Fig. 2 | The OHC linear trend in the Atlantic basin. The trend is zonally 
averaged over two periods, when AMOC is increasing (a) and decreasing 
(b). The two periods are chosen according to the observed AMOC trends 
in Fig. 3a. ISHII data are used in the first period and Scripps data are used 
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in the second period. Stippling indicates areas of statistical significance at 
the 95% confidence level. The linear trend is unreliable in the Southern 
Ocean prior to 2005, and so that region is masked. 
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Fig. 3 | AMOC and GSTA variations. a, Mid and subpolar latitude AMOC 
strength, as calculated at 41° N using altimetry measurements, from 

ref. '®* (red, two-year running mean, Sverdrup scale shown on the right); 
inferred from integrated subpolar salinity in 0-1,500 m and 45-65° N in 

the Atlantic as a proxy, using the ISHII (dark blue) and Scripps (purple) 
datasets, with a two-year running mean. The green curve is the subpolar 
salinity, similarly calculated but using EN4. The AMOC fingerprint® 

(dark blue) and the accumulated sea-level index (turquoise) calculated 
from historical tide gauge measurements” were smoothed with 10-year and 
7-year low-pass filters, respectively, from their sources. The subpolar gyre 


a long record with no trend). The high (++) phase consists of two rapid 
subphases. The increasing subphase (AMOC,,) started in 1993, from 
the low point in AMOC-, first slowly and then rapidly, peaking in 
2005. It is then followed by a rapid decreasing subphase (AMOC gown) 
(2005 to the present) (Fig. 3a). At low values of overturning (AMOC-) 
the strength is relatively level even though there are short-term fluc- 
tuations, because a slower poleward transport of saline water from 
the tropical Atlantic makes it difficult to speed up the sinking in the 
subpolar North Atlantic except through slower processes: The surface 
water could slowly become more saline through the reduction of fresh 
water outflow from land glaciers and from the Arctic Ocean”*. The 
northward transport of warm and saline water increased more rapidly 
since 1999, and started a negative feedback as the warm surface water 
increased glacier melt and freshwater outflow. The previous AMOC gown 
subphase of 1965-1974 started with the gradual freshening of the north 
Atlantic waters, as can be inferred from the decreasing salinity in the 
subpolar region, braking the AMOC. Incidentally, both SSH at 41° N 
and RAPID at 26° N showed a simultaneous, short-lived 30% drop in 
AMOC strength in 2009-2010°, partially caused by an extreme nega- 
tive episode of atmospheric North Atlantic Oscillation that affected the 
wind field® over both areas. 

Water masses in the subpolar and subtropical gyres are different and 
transports across gyre boundaries need not be continuous“. For verti- 
cal heat subduction, it is mainly the subpolar AMOC that is our focus 
in Fig. 3a. Signals from salinity proxies at the subpolar Atlantic have 
almost reached the previous low. The subpolar gyre SST has started to 
warm. The deep Labrador Sea density, which is known to lead by 7-10 
years changes in wider basin AMOC!**, has stopped declining since 
2014 (Extended Data Fig. 4). The subtropical region is more prone 
to higher-frequency perturbations!, and the RAPID time series is 


SST index”! in orange is also a two-year running mean. See Methods for 
details. The inset shows RAPID-measured AMOC at 26° N. b, Shown are 
GSTA from HadCRUT4.6 (black), the nonlinear secular trend (close to the 
100-year linear trend) (brown) and variation about the trend for timescales 
longer than decadal (multidecadal variability (MDV), red). The inset 
shows the SST spatial pattern associated with MDV obtained by regressing 
SST onto its time series. The blue curve is the smoothed version of GSTA 
obtained as the sum of the secular trend and MDV. The faint lines around 
the solid lines are from 100 ensemble members of the HadCRUT4.6, which 
assess the range of uncertainty of the data used in the solid lines. 


experiencing its short-term oscillations (two so far) after the recovery 
from the large dip in 2010 so the decadal trend may be difficult to see. 
Nevertheless, it appears to have stabilized at that latitude. Previously, 
when AMOC reached its lowest AMOC- value after 1975, that level 
phase lasted two and a half decades. Although we have data only for one 
cycle, its observed non-sinusoidal pattern characterized by a prolonged 
flat minimum separated by steep peaks is as expected from the physical 
arguments presented above. 

The longer Global-mean Surface Temperature Anomaly (GSTA) 
record shown in Fig. 3b, together with its low-frequency variation™*”>, 
consists of a secular trend and a multidecadal variability (MDV), 
defined to be on timescales that are decadal or longer. The spatial 
pattern associated with MDV (inset to Fig. 3b) has the pattern of an 
interhemispheric seesaw in the Atlantic, with the North Atlantic being 
the centre of action, consistent with model results?®. When the MDV 
is increasing it doubles the GSTA warming rate over the 100-year trend 
of 0.08 K per decade, and is associated with a period of rapid warm- 
ing in the late and also the early twentieth century. That secular trend 
of 0.08 K per decade, statistically significant at over 95% confidence 
level against a second-order autoregressive (AR(2)) red noise, has been 
attributed to the underlying anthropogenic global warming trend”’. 
The regressed spatial pattern associated with the secular trend resem- 
bles the model-predicted response from greenhouse warming”*”>. The 
MDV in the GSTA is related to the Atlantic Multidecadal Oscillation 
(AMO) (see Methods), the latter having a record extending back several 
hundred years. 

The previous period of low overturning in the AMOC-— phase, from 
1975 to the 1990s, coincided with a period of rapid global warming at 
the surface. This is more than a coincidence because the energy budget 
involved can be quantified. We do not have reliable subsurface data for 
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Fig. 4 | Contrasting thermosteric SSH* patterns for increasing and 
decreasing AMOC. a, c, Linear SSH* trend when AMOC is increasing; 
b, d, Linear SSH* trend when AMOC is decreasing. a and b show SSH* 


the period when the surface warming was rapid. However, the change 
from that period can be quantified so that an estimate can be made for 
what would happen if that change were absent. During 2000-2005, in 
the AMOC,, subphase, 52% of the global increase between 200 m and 
1,500 m is sequestered in the Atlantic. Together with the heat seques- 
trated in the Southern Ocean, it contributed to a period of global warm- 
ing slowdown. When this additional heat storage is absent, a period of 
rapid surface warming is expected to reoccur. 

Although the Argo programme was launched around 2000, its coverage 
in the Southern Ocean did not become adequate until 2005. To validate 
the data on OHC we compare satellite SSH* (the asterisk indicates the 
deviation of SSH from its global mean) available since 1993 (Fig. 4a and 
b) to the thermosteric sea level rise (due to thermal expansion of the 
water column) (Fig. 4c and d) calculated using OHC above 1,500 m. 
The comparison is surprisingly good north of 35° S. Notable exceptions 
are as expected; they include areas with no Argo measurements: shallow 
maritime areas west of the Caribbean islands, and the deep mid-Atlantic 
Ocean below 1,500 m, which was not included in our OHC. South 
of 35° S the linear trend in the Argo data is not reliable across 2003 
during the transition from no-Argo to Argo measurements”*, The two 
datasets consistently show that in the subpolar Atlantic there is increas- 
ing (decreasing) heat storage when AMOC is increasing (decreasing). 
The southward (northward) displacement of the Gulf Stream at mid- 
latitudes created some compensating cooling (warming)”!. In the 
AMOC’%s rapidly decreasing subphase, some heat is entrained in the 
subtropical gyre. The Southern Hemisphere north of 35° S is mostly 
featureless. South of 35° S, mesoscale patterns of warming can be seen 
in SSH*, which is also reflected in the OHC after 2004, but not before, 
owing to data quality. These mesoscale eddies in the linear trend 
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from remote sensing, compared with the steric sea level calculated using 
OHC in c and d. SSH* is SSH with its global mean subtracted, reflecting 
mostly the thermosteric part of SSH (see Methods). 


occurring south of the Antarctic Circumpolar Current may be due 
to its recent strengthening, and its increased baroclinic instability”. 

The increased sea level (Fig. 4b) and warmer SST (Extended Data 
Fig. 5d) in the western subtropical Atlantic may have led to strong 
hurricanes and their destructive power, and the surprising string of 
category-5 hurricanes making landfall towards the end of the decreasing 
phase of the AMOC, instead of at the peak of the AMOC, when the 
mean SST of the entire North Atlantic is the warmest and the basin- 
wide hurricane number is the highest*”. 

Climate-model runs under preindustrial conditions demonstrated 
the existence of multidecadal variation in AMOC, and its associated 
Atlantic SST variation: the AMOC+ (AMOC-) phase corresponds 
to warm (cold) SST and Northern Hemisphere mean surface temper- 
ature*!°, This prevailing paradigm has permeated popular perceptions 
about the future climate consequence of an AMOC weakened by global 
warming, similar to the abrupt switch back into icy conditions of the 
Younger Dryas during the last deglaciation”. Over the past few dec- 
ades, however, there is a positive trend of warmer subsurface water 
in the subpolar Atlantic (Extended Data Fig. 6), rendering the mean 
state lighter (see the temperature-salinity diagram in Extended Data 
Fig. 7). Deep convections can now carry more heat downward. In the 
presence of greenhouse heating from above and warmer SSTs, AMOC’s 
role in sequestering heat becomes important in the current global sur- 
face energy budget (Fig. 1). When AMOC is more constant, as in the 
AMOC- phase, little additional heat is sequestered in the Atlantic, 
contributing to a more rapid surface warming as more heat from radi- 
ative imbalance remains on the surface and the upper 200 m of the 
global oceans. We note, however, that we have discussed here only one 
component of a complex system: global heat balance is maintained by 
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the combined ocean and atmosphere systems and a change in the trans- 
port of one regional component may affect the partitioning of change 
between other parts of the ocean or of the atmosphere, depending on 
the timescales involved. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0320-y. 
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METHODS 

Updated AMOC indices. We reproduced the unfiltered monthly AMOC indi- 
ces (Extended Data Fig. 1). Their correlation coefficient with Zhang’s unfiltered 
AMOC fingerprint is listed on the right. All correlations are statistically significant 
at over 95% confidence level. 

AMOC indices in Fig.3a. Extended Data Fig. 1 shows that all unfiltered AMOC 
proxies used in Fig. 3a are correlated with Zhang’s fingerprint AMOC proxy at over 
95% confidence level. Zhang showed” that in the Geophysical Fluid Dynamics 
Laboratory model the fingerprint proxy is highly coherent with the model AMOC 
Index, defined as the zonal integrated maximum Atlantic overturning at 40° N, at 
decadal and multidecadal scales. This is the reason that the fingerprint is shown 
smoothed with a 10-year low-pass filter. This fingerprint is calculated using the 
detrended 400-m subsurface temperature. (It was updated to 2017 by the author 
with permission to use.) 

Our subpolar upper ocean salinity index is defined as the average over 45°-65° N 
in the Atlantic basin and integrated over 0-1,500 m. The two undetrended salinity 
indices shown in Fig. 3 and Extended Data Fig. 1 are from three data sources. The 
first index is based on ISHII and Scripps. ISHII data have not been updated since 
2012 and Scripps data are only available since 2004; they are connected at 2012 
when calculating the correlation coefficient with Zhang’s fingerprint AMOC proxy. 
The data source for the second salinity index is from EN4 (version 4.2.1). 

The sea-level index was obtained as in ref. ” by calculating the sea-level differ- 
ence between the average of a group of linearly detrended, deseasonalized tide- 
gauge measurements south of 35° N and that to the north. It is accumulated in 
time, shifted to the right by 4.8 years and smoothed with a 7-year lowpass filter. 

The subpolar gyre SST index was obtained by ‘detrending’ the subpolar gyre 
SST by the subtraction of the global mean SST. It is averaged over the subpolar 
gyre region, defined by ref. 7). 

Willis’: AMOC strength at 41° N was calculated'* using altimetry SSH meas- 
urements and geostrophic approximation for the zonal-mean northward velocity 
vertically integrated above 1,130 m. It is not detrended or accumulated. 

Error bars for data used in Fig. 3. The error bars for the salinity time series used 
in Fig. 3a are plotted in Extended Data Fig. 2. The uncertainty at each gridpoint is 
provided by each data source: ISHII, Scripps and EN4. The error bar of the salinity 
time series at each time is computed as the combination of the gridpoint uncer- 
tainty and one standard deviation due to the averaging in space. The uncertainty 
of the SSH-deduced AMOC strength was given by ref. '*. The measurement and 
sampling errors at each time gridpoint were +12%. The uncertainty of tide-gauge 
data was discussed by ref. ”, and that of Zhang’s fingerprint proxy by ref. *°. The 
uncertainty of the global surface temperature data from HadCRUT4.6 was assessed 
by the data source using 100 ensemble members that span the uncertainty range 
of the data. 

Calculation of warming scenarios. We emphasize that this is not a prediction, 
but a scenario calculation. In our current climate system, the OHC in the upper 
1,500 m of the global oceans increases at the rate of 0.42 W m~’, which is approx- 
imately the top-of-atmosphere radiative imbalance. Apart from short-term varia- 
tions of radiative imbalance such as those due to volcanic eruptions, it is reasonable 
to assume that for the next two decades there will not be an appreciable change in 
radiative imbalance, barring an unexpected development of carbon sequestration 
technology. 

Scenario 1. If the OHC storage below 200 m remains the same (no increases), then 
the radiative imbalance of the 0.42 W m” heats only the top 200 m of the global 
oceans. That is, the increase of OHC in the top 200 m of the oceans is responsible 
for the increase in the entire 1,500 m of the column. The top 200 m of the global 
ocean then warms at the rate calculated as: 0.42 W m~? divided by the heat capacity 
of 200 m of the ocean = 0.23 °C per decade. This is equivalent to that obtained for 
a ‘slab’ ocean of 200 m thick. 

Scenario 2. As for Scenario 1 except that only the Atlantic and the Southern oceans’ 
heat content below 200 m remain the same for the next two decades. The Pacific 
and the Indian oceans continue to increase their OHC at the current rate. The 
warming rate is 70% of that for Scenario 1 because at present the Atlantic and 
the Southern oceans together are responsible for 70% of the OHC increase in the 
upper 1,500 m of the oceans. This is probably the more likely scenario because we 
have argued in the main text that AMOC is likely to remain relatively constant 
during the next two decades. The subsurface Southern Ocean has been warming 
since at least 1993, caused by the southward displacement and intensification of 
the westerly jet, which cannot continue much longer, first because the proposed 
cause (the ozone hole) has diminished in importance as the ozone hole heals, and 
second because there is not much more room for the jet’s southward displacement. 
So the increase in warming will probably stop. 

Model AMOC and reconciliation with recent observations. Observational results 
in Fig. 3a show that there was a positive trend from 1993 to 1999, with a small 
peak in 1996. The rapid rising trend from 1999 to 2005 is statistically significant 
at the over 95% confidence level. This is seen in all proxies, most clearly in the less 


smoothed data (SSH and subpolar salinity). This claim is supported by observation 
of SSH-deduced AMOC strength, tide-gauges, the subpolar salinity proxy, and also 
the Zhang fingerprint proxy. (The last proxy, because of 10-year smoothing, does 
not show the smaller peak in the mid-1990s). A model reanalysis also showed an 
acceleration prior to 2005 followed by a decline at 26° N, and a peak in the mid- 
1990s as well as one in 2005 at 45° N'°. AMOC in models is sensitive to resolution 
and subgrid parameterization*!, resulting in little consensus among reanalysis 
(and hindcast) products. With one exception’® these products do not agree with 
the RAPID observation at 26° N. The exception is the GloSea5 model, which has 
a higher, eddy-permitting resolution than previous reanalyses. Supplementary 
figure 1 of ref. 1° shows two peaks, one at 1995 and one at 2005. The 1995 peak 
is slightly higher than the 2005 peak, and is referred to thus in the main text of 
ref. '®: “The AMOC at 45° N is representative of the changes in the subpolar gyre, 
with the AMOC decreasing from a maximum in the mid-1990s, followed by a 
slight increase (Fig. 1d)”. The peak in 2005 was not mentioned. However, the result 
on the 1995 peak should be treated with care, as the authors themselves stated in 
the supplementary information of ref. 1°: “It is likely that there will be a period 
of spinup, where the deep ocean (where there are few observational constraints) 
adjusts, which may explain the divergence in trend. Hence we disregard the first 
few years of each experiment. There is also a shock in 1992 when the altimeter data 
is introduced, which may contribute to the increase in AMOC strength between 
1989 and 1995. Hence we choose the period to analyse starting from January 1995, 
and join the two analyses in January 2002” The relative magnitude of the 1995 peak 
and the 2005 peak may be unreliable as it was obtained by joining two reanalyses, 
one starting from 1989 and one from 1995 with “divergence in trend”’®. 

The observed SSH data since 1992 can be used to deduce AMOC strength 
using geostrophic approximation, bypassing the problems of shock and subsequent 
adjustment when the same SSH data were introduced in model assimilation. 
SST changes during different phases of AMOC. The upper branch of the cli- 
matological AMOC brings warm and saline surface water from the subtropical 
North Atlantic to its subpolar latitudes. When the overturning is stronger, more 
of this warm water is found in the subpolar northern latitudes. In the Southern 
Hemisphere, more of the cold water from the region of the Antarctic Circumpolar 
Current is brought northward into the Southern subtropics. Consequently a char- 
acteristic signature in the Atlantic SST is an opposite-signed multidecadal anomaly, 
with warming to the north and smaller cooling to the south when the overturn- 
ing is stronger (AMOC-+), and the reverse pattern when it is weaker (AMOC-) 
(Extended Data Fig. 5a, b). This ocean-induced SST variability is centered in the 
subpolar North Atlantic”®. The observed tendency during the last two subphases 
of the AMOC is as expected (Extended Data Fig. 5c, d): As AMOC slows after 
2005, the SST tends towards a cooler North Atlantic and warmer subtropics. 
Accompanying the strong cooling in the subpolar gyre is an interesting intense 
warming after 2005 in the northwest Atlantic, centered in the Gulf of Main, which 
was recently simulated in a high-resolution climate model” as due to the north- 
ward displacement of the Gulf Stream when AMOC slows. The inverse relationship 
between Gulf Stream’s northward displacement and AMOC strength was found® 
to be caused by the Labrador Current retreat and the bottom vortex stretching*. 
AMO. In long coupled atmosphere-ocean model runs under preindustrial condi- 
tions (without increasing greenhouse gases) the AMO is the SST manifestation of 
AMOC variations, and the two time series are approximately in phase!’. The defi- 
nition of AMO in ref. !° is the mean of Atlantic SST north of 45° N, which may lead 
the subtropical SST anomaly by two years. A more traditional definition of AMO 
is the mean Atlantic SST north of the Equator*4, with an approximately one-year 
phase difference. It has been shown”, using the space-time perspective of rotated 
empirical orthogonal function analysis, that the AMO is mainly responsible for the 
observed global mean surface temperature variation on multidecadal timescales. 
The two are in phase during the industrial era. Since the AMOC and the global 
mean surface temperature variation are not in phase (as shown in Fig. 3), it follows 
that during the industrial era, AMOC and AMO are off phase, possibly by a quarter 
cycle, although AMOC'’s time series is too short for an accurate determination of 
the phase information. 

During the positive phase of AMO, SST is warm in the North Atlantic and sur- 
rounding continents. Therefore, Northern Hemisphere mean surface temperature 
is warm during the positive phase and cool during the negative phase of the AMO. 
Using multiproxy data in the Northern Hemisphere the AMO time series can 
be extended back several hundred years**. The longest instrumental temperature 
record exists in central England, and it was used?’ to reconstruct the AMO time- 
series back to the Little Ice Ages. An even longer record of ice cores in Greenland, 
in the northern Atlantic, exists, and a statistically significant at the over 95% con- 
fidence level AMO signal can be found” extending back to 800 ap that is coherent 
with the instrumental record of central England?’ during their overlapping period. 
It appears that AMO is a recurrent phenomenon of period around 65-70 years and 
that it is robust in the preindustrial era, with the Atlantic and the surrounding areas 
warm during the positive phase and cold during the negative phase. From climate 
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model preindustrial control runs, it seems that AMO is a surface manifestation of 
AMOC variation. Furthermore, based on palaeoclimate evidence of cold events 
when AMOC slows down abruptly, a common perception is that a slowdown in 
AMOC would lead to a cold Northern Hemisphere. The mechanism relies on the 
dominant role of AMOC (and its Gulf Stream) in horizontally transporting sur- 
face heat from the tropics to the mid- and high-latitude Atlantic, where it releases 
some heat to the cold atmosphere before sinking in the subpolar Atlantic. The 
heat released to the atmosphere makes Europe warmer (when wind blows in that 
direction) than it should be for its latitude. 

Calculating SSH* from altimetry data. SSH* is SSH with its global mean sub- 
tracted. SSH contains both the thermosteric part (due to thermal expansion of 
the entire water column) and the ocean water mass addition that is due to melting 
land ice. It is known that the ocean will adjust to any change in ocean mass rapidly 
through the propagation of gravity waves, and will reach a new equilibrium globally 
within a couple of months*’. Therefore, the subtraction of the global mean largely 
removes the mass contribution from SSH. 

Data availability. The datasets used in this study are all publicly available. They 
are: (1) ISHII data version 6.13, the objectively analysed subsurface temperature 
and salinity at 24 levels in the upper 1,500 m during 1945-2012 (http://rda.ucar. 
edu/datasets/ds285.3/); (2) Scripps gridded Argo data, objectively analysed subsur- 
face temperature and salinity at 58 levels in the upper 1,950 m since 2004 (http:// 
www.argo.ucsd.edu/Gridded_fields.html), which is based on Argo data collected 
and made freely available by the international Argo project and the national pro- 
grammes that contribute to it; Argo float data and metadata are available from the 
Argo Global Data Assembly Centre (https://doi.org/10.17882/42182); (3) EN4 
data version 4.2.1, objectively analysed subsurface temperature and salinity at 42 
levels in the upper 5,350 m since 1900 (https://www.metoffice.gov.uk/hadobs/ 
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en4/download-en4-2-1.html); (4) Sea surface height based on satellite altimetry 
from the Archiving, Validation, and Interpretation of Satellite Oceanographic Data 
(AVISO) (https://www.aviso.altimetry.fr/en/data.html); (5) Tide gauge records 
from the Permanent Service of Mean Sea Level (PSMSL) (http://www.psmsl.org); 
(6) Extended Reconstructed Sea Surface Temperature (ERSST, version 3b) (http:// 
www1.ncdc.noaa.gov/pub/data/cmb/ersst/v3b/netcdf); (7) RAPID AMOC at 
26.5° N (http://www.rapid.ac.uk/rapidmoc/rapid_data/); (8) Ref. !°, updated by 
the author (ftp://oceans-ftp.jpl.nasa.gov/pub/jwillis/AMOC/2016/). 

Code availability. Scripts for analysing the data are available from the correspond- 
ing author upon reasonable request. 
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Unfiltered AMOC Proxies 
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Extended Data Fig. 1 | Unfiltered AMOC proxy time series in monthly Data are taken from refs *?*. All of the correlation coefficients are above 
resolution. The thick solid lines are 13-month running means. The 


95% confidence level. The accumulated sea-level index is shifted to the 
numbers to the right of each time series show the correlation coefficient 


right by 4.8 years in this figure. Without the time shift, its correlation with 
with the unfiltered AMOC subsurface temperature fingerprint of Zhang. the AMOC proxy is practically zero (r= 0.06). 
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Extended Data Fig. 2 | Error bars for the three salinity time series shown in Fig. 1. The colour lines are monthly values of uncertainty, superimposed 
on the 13-month means of the time series. psu, practical salinity units. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a. Global mean temperature anomaly 


slowdown rapid warming slowdown 


0 3 
1950 1960 1970 1980 1990 2000 2010 
Year 


Extended Data Fig. 3 | Coincidence of the three AMOC phases with global warming slowdown and acceleration. a, Global mean surface temperature. 
b, OHC north of 45° N in the Atlantic. c, Salinity north of 45° N in the Atlantic. 
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the 1,000-1,500 m layer of the Labrador Sea, regionally averaged over the boundary at depth, changing the cross-basin zonal gradient, and hence the 
ocean area shown in the inset, from the three data sources given. Aleading _ geostrophic southward velocity'’. The return flow then strengthens the 
signal for stronger AMOC is the increased deep Labrador Sea salinity upper branch of AMOC with a lag of 7-10 years'>"®. 
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Extended Data Fig. 5 | SST patterns during different AMOC phases. a, When AMOC is below climatology. b, When AMOC is above climatology, SST 
detrended. c, SST linear trend when AMOC is increasing. d, When AMOC is decreasing. 
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Extended Data Fig. 6 | Linear trends, from 1950 to 2017, of temperature, salinity and density. a—c, Trends in temperature (a), salinity (b) and density 
(c) as a function of depth. Solid curves indicate where the trend is statistically significant at 95% confidence level. 
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Extended Data Fig. 7 | Temperature-salinity diagram. The subpolar 1940 in blue. The dots shown are the five winter month values (NDJFM). 
Atlantic Ocean (45°-65° N) for each depth between 300 mand 1,500m for At these depths the seasonal cycle is very small**. 
the two periods, with the mean of 2000-2016 in red and the mean of 1920- 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


1 ola De ws 


https://doi.org/10.1038/s41586-018-0273-1 


An inverse latitudinal gradient in speciation rate for 


marine fishes 
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Far more species of organisms are found in the tropics than in 
temperate and polar regions, but the evolutionary and ecological 
causes of this pattern remain controversial!?. Tropical marine 
fish communities are much more diverse than cold-water fish 
communities found at higher latitudes**, and several explanations 
for this latitudinal diversity gradient propose that warm reef 
environments serve as evolutionary ‘hotspots’ for species 
formation**. Here we test the relationship between latitude, species 
richness and speciation rate across marine fishes. We assembled a 
time-calibrated phylogeny of all ray-finned fishes (31,526 tips, of 
which 11,638 had genetic data) and used this framework to describe 
the spatial dynamics of speciation in the marine realm. We show that 
the fastest rates of speciation occur in species-poor regions outside 
the tropics, and that high-latitude fish lineages form new species 
at much faster rates than their tropical counterparts. High rates of 
speciation occur in geographical regions that are characterized by 
low surface temperatures and high endemism. Our results reject 
a broad class of mechanisms under which the tropics serve as an 
evolutionary cradle for marine fish diversity and raise new questions 
about why the coldest oceans on Earth are present-day hotspots of 
species formation. 

The steep decline in species richness from the equator to the poles is 
one of the most general large-scale patterns in biology®!” and has existed 
in its general form for more than 30 million years'!. Many proposed 
mechanisms for this latitudinal diversity gradient (LDG) explain high 
tropical diversity as the outcome of faster rates of species origination: 
the tropics are an evolutionary cradle for new species, and the gradient 
reflects—at least in part—lower rates of species formation in regions 
outside the tropics’!”. Studies on fossil mollusks!’, plankton! and 
corals° support the hypothesis that rates of marine species formation 
are faster in the tropics than at higher latitudes. 

We tested whether latitudinal variation in the rate of speciation can 
explain the LDG in marine fish diversity by reconstructing speciation 
rates across fishes and analysing them in a geographical context. We 
focused explicitly on recent speciation rates*!*!>, because extinction 
reduces our ability to infer rates deep in the past’®. We also ignored 
phylogenetic estimates of extinction rates, given the unreliable nature of 
these parameters in phylogenetic diversification models”. If speciation 
rates are controlled by energy—perhaps owing to accelerated chemical 
reactions, life histories or mutation rates!*!°—then we should observe 
a footprint of rapid speciation in the distribution of recent speciation 
times for tropical taxa. 

We assembled a distribution of all-taxon assembled (ATA) time- 
calibrated phylogenetic trees of ray-finned fishes (31,526 species). 
The ATA phylogenies include 11,638 species with genetic data (5,231 
marine species); the remaining 19,888 species that did not have genetic 
data were placed using stochastic polytomy resolution (Methods) to 


generate taxonomically consistent resolutions of all taxa without genetic 
data under a conservative constant-rate birth-death process. The ATA 
trees were time-calibrated using a database of 139 fossil taxa (Extended 
Data Fig. 1 and Supplementary Information). We estimated or com- 
piled geographic ranges for the majority of known marine species, 
including all species with genetic data. We estimated speciation rates 
across the phylogenies using BAMM”*, a Bayesian framework for 
reconstructing complex evolutionary dynamics from phylogenetic 
trees, and DR, a summary statistic that infers recent speciation rates 
for all tips in the phylogeny without requiring a formal parametric 
inference model”!. We denote these two analyses of speciation rates as 
Asam and App, respectively. The Agamm and App rates include sub- 
stantial historical information and are best interpreted as the rate of 
lineage splitting averaged across the past 10-20 million years (Myr); 
units for speciation presented here are per-lineage rates in units of lin- 
eages per Myr. We also computed a simple interval-based measure of 
speciation rate for a series of path intervals from 0.25 Myr to 50 Myr 
before present”’, providing a window of reliability for Apamm and Appr. 
Estimates of Ap were computed across the distribution of ATA phy- 
logenies, thus generating rate estimates conditional on the uncertainty 
in placements of taxa without genetic data. Apamm was estimated from 
the primary dated tree including all taxa with genetic data (n = 11,638), 
and incomplete sampling was incorporated by using family-specific 
sampling fractions. 

Consistent with previous studies*“, we find a strong LDG in marine 
fish diversity, with an extreme richness peak in the Coral Triangle of 
the tropical Indo-Pacific Ocean (Fig. 1a). However, analysis of per-cell 
mean speciation rates reveals a notable inverse relationship between 
the rate of species formation and latitude (Fig. 1b-e). Mean specia- 
tion rate for cell assemblages from tropical regions (<23.5°; n= 6,698 
cells) was Azamm = 0.08 (Apr=0.11) and the corresponding rate for 
high-latitude (>45°; n = 4,347 cells) assemblages was \pamm = 0.14 
(Apr= 0.16). These rate differences are substantially greater when 
comparing more species-rich assemblages from continental shelf 
and slope regions: shallow (mean depth <2,000 m) tropical cells have 
ApAMM = 0.08 (Apr= 0.11; n= 833), whereas corresponding high- 
latitude cells have Agamm = 0.18 (Apr =0.22; n= 1,182). We computed 
means for 232 marine biogeographic ecoregions—encompassing the 
Earth’s shallow and coastal regions—and used spatial simultaneous 
autoregressive (SAR) models with breakpoints to assess the relation- 
ship between latitudinal position and speciation rate. Regardless of 
how regional mean rates are computed, all SAR models have highly 
significant effects of latitude on speciation rate (P < 0.001; n= 232 
regions; Extended Data Fig. 2). In general, for latitudes greater than 
25° north or south, each ten-degree increase in latitude increases the 
assemblage-wide speciation rate by approximately 0.025 lineages per 
Myr. However, speciation rate is effectively decoupled from latitude 
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Fig. 1 | Latitudinal gradient in species 
diversity and speciation rate in marine 
fishes. a, b, Mean species richness (a) and 
speciation rate (b) for marine fish assemblages 
at the global scale. c, d, Marginal distributions 
of richness (c) and speciation rate (d) with 
respect to latitude (m = 16,150), with cell 
colours corresponding to scale bars in a, b. e, 
Mean speciation rates for endemic taxa only 
(n =2,698). Results shown for Agamm but 
similar results are obtained for Apr (Extended 
: Data Figs. 2, 3). Grid cell size is 150 x 150 km 
v for all panels. 
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for tropical and subtropical regions (Extended Data Fig. 2h). General 
results reported here are robust across all of the measures of speciation 
rate and associated weighting schemes that we considered (Extended 
Data Figs. 2, 3). 

Speciation rate is strongly and negatively associated with both species 
richness (Fig. 2a) and annual sea surface temperature (Fig. 2b), although 
sea surface temperature is highly correlated with latitude (r= -0.95 
across 16,150 grid cells). Regional assemblages of fishes with the fastest 
rates of speciation occur at the highest latitudes and are characterized 
by cold surface temperatures (Fig. 2 and Extended Data Fig. 4). The 
south polar seas, dominated by the in situ radiation of highly specialized 
and geographically restricted icefishes and their relatives”, are charac- 
terized by the fastest overall rates of species formation of any marine 
region on Earth. Continental shelf and slope assemblages from the 
Southern Ocean surrounding Antarctica have mean speciation rates 
of Agamm = 0.27 and Apr=0.26 (n =179 cells); these rates substan- 
tially exceed those observed for the Coral Triangle (Agamm = 0.08 and 
Apr= 0.11; n= 220 cells), despite a mean 62-fold difference in per-cell 
species richness for these regions. Assemblages from the Arctic also 
have high speciation rates (Agamm=0.17 and Apr=0.24; n=511 cells), 
despite little overlap between the clades that comprise the northern and 
southern polar faunas”. There is a strong positive relationship between 
several analyses of regional endemism and assemblage-wide speciation 
rate (Fig. 2c and Extended Data Fig. 4e; n = 60 regions). The correlation 
between and endemism is high overall (Agamm r= 0.81; App, r=0.79). 
The Mediterranean Sea is a clear outlier with respect to this overall pat- 
tern, combining high endemism with relatively low speciation (Fig. 2c). 
This suggests that the factors contributing to endemism per se are not 
necessarily those that promote fast speciation. 

As an alternative to the analysis of mean speciation rates by grid cell 
and biogeographical region (Figs. 1, 2), we analysed Agpamo and Apr 
for individual fish species with respect to their latitudinal midpoint. 
High-latitude fish clades are characterized by rapid speciation relative 
to low-latitude and reef-associated clades, and there is a strong rela- 
tionship between the centroid midpoint of the geographic range for 
each species and its estimated rate of species formation (Fig. 3 (inset) 
and Extended Data Fig. 5). We formally tested the relationship between 
latitudinal midpoint and speciation rate using several methods that 
are robust to model misspecification and phylogenetic pseudorepli- 
cation”*°, The correlation between absolute latitudinal midpoint and 
Xpr is 0.27 (P < 0.001); similar results are obtained for Apamm and 
latitude (r =0.3; P=0.006). Across a range of latitudinal thresholds, 
we find a highly significant difference in speciation rate for high- and 
low-latitude fishes (P < 0.001 across all thresholds), and cold- 
temperate and polar lineages speciating approximately twice as fast as 
the average low-latitude lineage (Extended Data Table 1). 


Latitude (cell midpoint) 


Species with latitudinal midpoints in the tropics (23.5° S to 23.5° N; 
n= 3,461) have mean speciation rates of Apamm = 0.09 and Apr=0.12. 
By contrast, species with latitudinal midpoints greater than 45° N or 
45° S (n=574) have Agamm = 0.20 and Apr=0.25. These rates are even 
more extreme for subpolar and polar taxa: across fishes in our dataset 
with latitudinal midpoints greater than 60° (n = 122), mean speciation 
rates were Agamm = 0.29 and Apr = 0.35. Interval-based estimates of 
speciation rate” indicate that the overall tropical-temperate-polar gra- 
dient that we report here has been present for millions of years, extend- 
ing back in time at least the Miocene/Pliocene boundary (Extended 
Data Fig. 6). 

Reef-associated clades, which comprise a substantial fraction of 
the tropical diversity peak, are not characterized by exceptional rates 
of species formation. Three of the largest such clades—the wrasses, 
damselfishes and gobies—collectively account for approximately 
3,000 species, yet have low to moderate rates of speciation estimated 
using BAMM (wrasses: Agamm = 0.10; gobies: A3amm = 0.07; damsel- 
fishes: \zamm = 0.12) and DR (wrasses: Apr =0.12; gobies: Apr = 0.10; 
damselfishes: Apr = 0.14). By contrast, temperate and polar fish faunas 
are dominated by members of multiple clades that have exceptionally 
high rates of species formation (Fig. 3), including snailfishes, eelpouts, 
Sebastes rockfishes and Antarctic notothens (icefishes and allied spe- 
cies). These coldwater taxa are characterized by speciation rates that 
exceed 0.26 (Agamm) and 0.34 (Apr). With the possible exception of 
gobies, we find little evidence for early bursts of speciation during the 
radiations of major tropical and reef-associated clades across the past 
20-60 Myr (Extended Data Fig. 6). We note that 79.7% of marine spe- 
ciation events in our ATA phylogenies are inferred to have occurred 
after the Oligocene/Miocene boundary, suggesting that the timescales 
over which we have estimated speciation rates are relevant to the origin 
and maintenance of modern LDG in marine fishes. 

An alternative explanation for the global gradient in speciation rates 
that we report involves environmental or biogeographical filtering on 
traits that are also associated with rapid speciation. For example, per- 
haps speciation rates are most rapid for fishes that inhabit cold and dark 
bathyal or abyssal regions; physiological adaptations for life in those 
environments might predispose these lineages towards disproportion- 
ate representation in high-latitude communities. This hypothesis pre- 
dicts that deep-sea lineages should speciate more rapidly than shallow 
lineages, regardless of latitude. However, mean rates for high-latitude 
(>45°) deep-sea fishes are much faster than for low-latitude (<45°) 
deep-sea species (high latitude: Apamm = 0.29, Apr= 0.37, n=75; low 
latitude: Agamm =0.15, Apr = 0.15, n =218). Across all deep-sea fishes 
represented in our dataset (n = 293), there is a strong positive correla- 
tion between absolute latitudinal midpoint and speciation rate (r = 0.50; 
P < 0.001). There is no effect of depth classification on speciation rate 
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Fig. 2 | Species richness, temperature and speciation rate in marine 
fishes for individual grid cells. a, Negative relationship between species 
richness and mean speciation rate (Agamm) for individual grid cells 

(n= 16,150). b, Negative relationship between mean annual sea-surface 
temperature and mean speciation rate for cells. c, Positive relationship 
between regional endemism and mean speciation rate for all species 
occurring in a particular biogeographical province (n = 60 biogeographical 


for tropical fishes (P > 0.25 across all classification schemes; Extended 
Data Fig. 7a). A secondary prediction of the filtering hypothesis is 
that high-rate, high-latitude clades should be nested within high-rate 
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provinces). Squares and circles denote provinces with latitudinal midpoints 
north and south of the equator, respectively; cell colours denote latitude. 
Point labelled ‘M’ in the lower right of c is the Mediterranean Sea, which is 
characterized by high endemism and low speciation rate. Nearly identical 
results are obtained for Apr and for BAMM analyses that assume time 
constancy within rate regimes (Extended Data Fig. 4). 


tropical or deepwater clades. We tested this hypothesis for perciform 
fishes, which account for 66.3% of high-latitude fishes (Extended 
Data Fig. 7b). Perciformes include four of the most-rapidly speciating 
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Fig. 3 | Latitudinal gradient in per-taxon speciation rate for marine 
fishes. Top, BAMM-estimated speciation rates across phylogenetic tree 
of 5,223 marine fishes for which genetic and geographic range data were 
available. Iconic coral reef clades are indicated with single arc segments; 
double segments denote high-latitude lineages that drive the overall 

fast speciation rate for temperate and polar fishes. Inset box plots show 
the median and interquartile range in distribution of rates (App and 
Asamm) for individual taxa with respect to the centroid midpoint of their 
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latitudinal distribution, with species values binned in 10° increments. 
Bottom, phylogenetic niche conservatism in marine fish lineages as 
reflected by the geographical distribution of latitudinal midpoints; each 
point is the centroid midpoint of an individual species, and colours reflect 
corresponding \gamm estimates. Clades denoted with pink polygons are 
dominant high-latitude fish clades; grey polygons are predominantly 
reef-associated clades. The fish images were created by J. Johnson. 
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major clades of marine fishes (Notothenioids, Sebastidae, Zoarcidae 
and Liparidae), but these high-latitude clades are either nested within 
other high-latitude clades or within largely tropical clades that have 
low speciation rates (Extended Data Fig. 7c). The overall latitudinal 
gradient in speciation rate is thus unlikely to be explained by filtering 
on deepwater clades with rapid speciation rates into high-latitude bio- 
geographical provinces. 

We performed a complementary set of analyses based only on pri- 
mary occurrence records from museum databases and other sources 
(see Methods). These estimates of species ranges yield highly congruent 
results (Extended Data Fig. 8). Our results are not conditional on a 
specific parametric model for inference; the terminal branch lengths 
themselves are strongly associated with latitude (Extended Data Fig. 9), 
indicating that few assumptions are required to obtain the results pre- 
sented here. Furthermore, these results cannot be explained by varia- 
tion in the completeness of taxonomic sampling with respect to latitude 
or by alternative reconstructions of geographic range (Extended Data 
Figs. 8, 9). 

Our results reject the hypothesis that rapid speciation explains 
the spectacular diversity of tropical marine shallow-water fishes and 
reveal that, paradoxically, speciation rates are fastest in the geograph- 
ical regions with the lowest species richness. Several evolutionary 
explanations for the LDG propose that fundamental relationships 
between energy and speciation rate control the accumulation of bio- 
diversity over time!®!°, and it has been said that the tropics are more 
diverse because ‘the Red Queen runs faster when she is hot”. For the 
marine fish species that were studied here—and for many terrestrial 
vertebrates! there is no evidence to support these biophysical link- 
ages between energy, metabolism and speciation. Faster speciation 
contributes to total species richness in some island and freshwater 
lacustrine systems””®, but for larger biogeographical provinces— 
including the marine realms considered in the present study—it 
is increasingly unlikely that speciation rate variation is the primary 
cause of diversity gradients””’. Whether the rapid speciation that we 
have documented in Earth’s cold oceanic provinces reflects a recent 
and ongoing expansion of marine diversity is a key frontier for future 
research on the LDG in marine organisms. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0273-1. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators were not blinded to 
allocation during experiments and outcome assessment. 

Matrix assembly. We used the PHLAWD pipeline” to generate a 27-gene multi- 
locus alignment for ray-finned fishes (Supplementary Information). Guide align- 
ments were constructed using data from recently published studies of higher-level 
actinopterygian relationships*”. Guide alignments also included new sequences 
for 442 species of actinopterygians (Supplementary Table 2; see ‘Data availabil- 
ity’) generated using a standardized phylogenetic workflow for fishes**. PHLAWD 
produced a preliminary alignment of 15,606 species. We performed a series of 
curation steps including BLAST searches of each sequence back to GenBank to 
identify taxonomically misassigned species, taxonomic name reconciliation against 
the California Academy of Sciences taxonomy, duplicate species detection and 
visual identification of poorly aligned sequences (Supplementary Information). 
We removed rogue sequences using the RogueNaRok searches*® and performed 
preliminary tree searches in RAxML to identify and remove sequences with patho- 
logically long branches due to misalignment. After curation of the raw alignment, 
our final alignment contained 11,638 species. We used PartitionFinder* to identify 
a model of sequence evolution for multigene alignment and RAxML to find the 
maximum likelihood topology and calculate Shimodaira-Hasegawa-like support 
values*? (Supplementary Information). 

Divergence time analysis. We surveyed the palaeontological literature and museum 
catalogues to assemble our actinopterygian fossil calibration set (139 early occur- 
rences for 130 nodes; see Supplementary Information and ‘Data availability’). Fossil 
assignment to nodes was based upon synapomorphies for that node from published 
phylogenetic studies, diagnostic characters for taxonomic ranks and/or detailed 
surveys of clade fossil records by experts. Fossil ages were used as minimum age 
constraints; maximum ages were derived for all nodes following the Whole Tree 
Extension of the Hedman algorithm”, a probabilistic method that incorporates 
outgroup ages and that has recently been implemented in R*”. We identified 130 
nodes that could be assigned fossil constraints (Supplementary Information) and 
time-calibrated the phylogeny using treePL**. A graphical summary of the dis- 
tribution of calibrations across the phylogeny is shown in Extended Data Fig. 1. 
Phylogenetic placement of unsampled species by taxonomy. Using the time- 
calibrated phylogeny as a backbone, we generated a distribution of trees in which 
missing taxa were placed according to their taxonomy. For each of the unsampled 
species of ray-finned fish, we assigned the most restrictive taxonomic rank (for 
example, genus, family, order) that was recovered as monophyletic in our maxi- 
mum likelihood phylogeny. To determine divergence times for unsampled species 
in the phylogeny, we sampled from a distribution of waiting times conditioned on 
rank-specific estimates of the speciation rate and sampling fraction using a custom 
Python script implementing functions from TreePar and SimTree*”~*!, and added 
unsampled species based on the assigned taxonomic rank and inferred waiting 
time. This procedure was repeated 100 times to generate a distribution of fully 
sampled ray-finned fish phylogenies (Supplementary Information). This proce- 
dure is similar to stochastic polytomy resolution as implemented in PASTIS”, but 
permits construction of extremely large phylogenies using all molecular data in a 
single analysis, rather than a two-stage process that begins with a reduced backbone 
dataset followed by separate tree searches for each crown lineage. Additionally, 
our procedure generates a local estimate of diversification rates at each taxonomic 
node, rather than using a global diversification rate, permitting more accurate 
placements of unsampled taxa when diversification rate heterogeneity may exist. 
Estimation of geographic ranges and species richness. We used the AquaMaps 
algorithm*** to estimate geographic ranges for marine fishes. These maps were 
generated using an environmental envelope approach that predicts species distri- 
butions based on available species-specific occurrence records at the 0.5°-grid- 
cell scale in conjunction with the following environmental predictors: depth, sea 
surface temperature, salinity, proportional ice cover and primary productivity. 
The predictive algorithm also incorporated geographical bounding boxes to limit 
occurrences to known ocean-scale distributions for each taxon. We transformed 
the AquaMaps distributions to a Behrmann equal area projection, and upscaled the 
resulting grids to 150 x 150-km resolution. We then converted the AquaMaps suit- 
ability scores for each cell to binary presence or absence by applying a fixed thresh- 
old of 0.5. This threshold was selected based on manual inspection of a number of 
individual species ranges. Expert opinion was then used when available to further 
refine the projected distributions, typically by truncating the AquaMaps predic- 
tions in light of museum occurrence data, known biogeographical barriers and spe- 
cialist literature on particular taxa. Where available, we incorporated more accurate 
distributional maps produced by taxonomic experts in particular groups***, The 
final dataset included maps for 12,050 marine species out of an estimated total of 
15,500 described marine species’?. Our conclusions should be unaffected by these 
missing and uncommon taxa, given that we were able to reconstruct the previously 
hypothesized pattern of marine fish richness on a global scale*. 


Occurrence-based analyses. As an alternative to range predictions from 
AquaMaps suitability scores, we performed a parallel set of occurrence-based anal- 
yses in which we reconstructed cell-based species assemblages as well as species 
latitudinal midpoints. We obtained all actinopterygian records from four major 
biodiversity occurrence aggregators (Global Biodiversity Information Facility 
(GBIF), Ocean Biogeographic Information System (OBIS), Fishnet2 and VertNet) 
between February 2014 and January 2015 and removed redundancies, resulting 
in a total of 13,322,575 marine fish occurrences. We downloaded all actinoptery- 
gian data from GBIF (https://www.gbif.org/) using their download API version 1; 
FishNet2 data (http://www.fishnet2.net/) were acquired using a custom Python 
script to download KML files for each species. VertNet (http://www.vertnet.org/) 
and OBIS (http://www.iobis.org/) data were retrieved by contacting the adminis- 
trators of these databases, who then provided us with the relevant data. To reconcile 
and combine the four datasets, we used museum accession numbers to deduplicate 
identical records contained in multiple databases. Where accession identifiers were 
inconsistent within a single museum, we unified these accessions onto a common 
scheme using a custom Python script. To reconcile species names by resolving 
synonyms and other sources of error, we used the same procedure described in 
the Supplementary Information. Institutions contributing substantially to the 
occurrence dataset are listed in Supplementary Table 6. 

The occurrence dataset was filtered to exclude records that fell on land, and 

records with zero-zero or other nonsensical coordinates. Species richness counts 
were then calculated across a global grid at 300 x 300 km resolution, using the 
Behrmann equal area projection. We further excluded isolated grid cells with 
recorded species, and removed cells that were greater than two standard deviations 
from the residuals of a thin plate spline interpolation that was fit to the species 
richness grid. These filters allowed us to remove cells that were probably unre- 
alistic representations of the species diversity at those locations. For all analyses 
presented, the same richness and bathymetry filters were applied that were used 
with the primary map data. 
Estimates of speciation rate. We reconstructed speciation rates using (i) an inverse 
equal-splits measure of speciation rate (App), also known as the ‘DR statistic’?!°°), 
(ii) BAMM estimates of speciation rate allowing time-varying rate regimes 
(Apamm)?2>2»3, and (iii) BAMM estimates of speciation rate assuming time con- 
stancy of speciation rates within rate regimes (Apamm-rc)- For the App analyses, we 
accounted for missing taxa by computing App for each tip in the ATA 31,526 taxon 
phylogenies; we then computed the average value for each taxon across the set 
of 100 trees generated with stochastic polytomy resolution. Stochastic polytomy 
resolution generates taxonomic placements that may compromise inferences of 
trait-dependent diversification because taxa are placed on trees in a manner that 
is inconsistent with the underlying process of trait evolution™, and we excluded 
all taxa lacking genetic data from formal statistical analysis of the relationship 
between latitude and speciation. However, including these taxa during estimates 
of App reduces bias due to incomplete taxon sampling and our calculations of Apr 
effectively integrate over the number and location of unsampled species. 

BAMM analyses were performed on the time-calibrated phylogeny contain- 
ing 11,638 tips for which genetic data were available. For each of the two classes 
of BAMM models (Agamm and Agamm-tc), we performed three BAMM runs for 
50 million generations using default MCMC operators and a prior expectation of 
500 shifts to facilitate convergence’. Raw output and control files to repeat these 
analyses are available through the Dryad data repository (see ‘Data availability’). 
We were unable to achieve satisfactory convergence when running BAMM on the 
all-taxon (31,526 tip) phylogenies; we therefore used sampling fractions to account 
for the effects of incomplete sampling. We corrected for incomplete sampling at 
the family level. We computed the mean of the marginal posterior distribution of 
speciation rates for each tip in the phylogeny for both Agamm and Agamn-tc- AS 
an alternative to gam and Apr, we computed a simple node density estimate of 
speciation rate”. For each taxon, these estimates are computed simply as 17/T, 
in which nr is the number of nodes on a path of length T, traversing the tree 
backwards from the tips towards the root. An estimate for an interval of 5 Myr 
would represent the average speciation rate for a given tip during the past 5 Myr. 
We computed node density estimates of speciation rate for a sequence of intervals 
between 0.25 and 50 Myr (Extended Data Fig. 6 and Supplementary Information). 
As for App, the node density estimates of speciation rate were computed over the 
full set of ATA phylogenies. 

Grid-based analyses of speciation rate. We computed mean speciation rates 
(Apr, Anamo and Agamn-tc) for regional assemblages of fishes, focusing on sets of 
species that are presumed to occur together at the scale of the 150 x 150-km grid 
cell. We computed the mean rate for individual grid cells four different ways, to 
reduce spatial and taxonomic pseudoreplication across cells. The simplest approach 
involved computing the arithmetic mean 4 for all species inferred to be present in a 
particular cell (Fig. 1). Following Jetz et al.”!, we computed weighted arithmetic and 
geometric means of speciation rates to reduce the contribution of geographically 
widespread taxa to the overall mean. For the arithmetic mean, the rate for the kth 
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grid cell is computed as \y= (iw;Aj)/Xw; in which } denotes a summation over all 
N species (i= 1 to i=N) present in the kth cell, and w; is the weight assigned to the 
ith species. We computed weights for each species as the inverse of the number of 
grid cells in which the species was found”!. Therefore, geographically widespread 
taxa contribute less to a cell mean than a taxon with a highly restricted geograph- 
ical distribution. Finally, we computed cell means for ‘realm endemics —species 
uniquely found in one of 12 biogeographical realms under the MEOW marine 
bioregionalization scheme” (n= 1,053 and 3,100 realm endemics with and without 
genetic data). The analysis of endemic taxa is particularly informative as such taxa 
provide more localized information about speciation rates in particular geograph- 
ical regions relative to widespread taxa that may be found in multiple regions””. 

To formally assess the relationship between latitude and assemblage-level spe- 
ciation rate, we first computed the mean speciation rate for all cells within a par- 
ticular ecoregion (n = 262) under the MEOW biogeographical regionalization”. 
We modelled the speciation-latitude relationship at the scale of ecoregions rather 
than individual cells because of the high autocorrelation between adjacent cells, 
which was reduced at the ecoregion scale, and to reduce the computational burden 
associated with analysing the full (16,150 grid cell) dataset. To account for spatial 
autocorrelation between ecoregions, we implemented simultaneous autoregressive 
error (SAR) models using the spdep package in R**®. These and other statistical 
tests are two-tailed. We defined neighbours for SAR models as those ecoregions 
with contiguous boundaries; we then selected the appropriate weighting scheme 
using Akaike information criterion (AIC) model selection. Simple visual inspection 
of our data (Fig. 1 and Extended Data Fig. 2) and ordinary least squares (OLS) 
breakpoint regressions reveal a clear biphasic signal in the relationship between 
speciation rate and latitude, with a linear increase for higher latitude cells (approx- 
imately 30° N and 30° S) and a much weaker relationship for low (tropical) lati- 
tudes. We therefore considered an expanded set of breakpoint SAR models with no 
relationship between absolute latitude and speciation for cells below a particular 
threshold value, and a linear effect of absolute latitude on speciation above the 
threshold. We used maximum likelihood analyses to estimate the threshold loca- 
tion and we compared the fit of the breakpoint model to a simple no-breakpoint 
SAR model using AIC (Extended Data Fig. 2). We used Moran's ! to test for spatial 
autocorrelation in the residuals of OLS and SAR regressions to determine whether 
the SAR model successfully accounted for spatial non-independence in the data. 
We tested the relationship between assemblage speciation rate and latitude for 
ecoregions with absolute latitude less than the previously identified breakpoints 
(for example, tropical and other low-latitude regions). In general, there is at most 
a marginal effect of latitude on speciation rate for tropical and subtropical regions 
(Extended Data Fig. 2h). Finally, we estimated endemism for each MEOW marine 
biogeographical province using two analyses of occupancy. These two analyses of 
regional endemism, E, are given by E=(1/N)%(1/O,), in which N is the number 
of species occurring in the focal region, O, is the estimated global occupancy of 
the kth species from that region, and © denotes a summation from k= 1 tok=N. 
Occupancy is computed as either the total number of biogeographical provinces or 
as the total number of equal-area grid cells in which the taxon is found. 
Trait-dependent speciation. We treated the absolute value of the latitudinal mid- 
point of each species as a ‘trait’ and tested its relationship to speciation rates using 
formal statistical methods for analysing trait-dependent diversification”*!. The 
latitudinal midpoint for each species was computed as the centroid midpoint of 
the geographical range of the species. We used three recently developed methods 
for testing the effects of species traits on lineage speciation rates that are robust to 
phylogenetic pseudoreplication and model misspecification®. Using ES-SIM’>, 
we tested whether App was correlated with absolute latitudinal midpoint for indi- 
vidual species. Using FiSSE*!, we then tested whether two discrete classifications 
of species by latitude (‘low latitude/tropical’ versus ‘high latitude/temperate’) dif- 
fered in their rate of speciation as measured using App. We performed the FiSSE 
test across a range of thresholds for classifying lineages as tropical and temper- 
ate (23.5°, 25°, 30°, 35°, 40°, 45°, 50°, 55° and 60°). Regardless of the threshold, 
all FiSSE results indicated a highly significant effect of latitude on speciation 
rate (Extended Data Table 1). As an alternative method for continuous traits, 
we used STRAPP” to test whether latitude was correlated with the two BAMM- 
based measures of speciation rate (Apamm and Apamm-tc). Results for \gamm and 
ABAmo-tc Were almost identical and identified a strong effect of latitude on spe- 
ciation rate (Pearson’s r= 0.30-0.31; P < 0.006). One possible explanation for our 
results is that high-latitude assemblages are enriched for deep-sea taxa, and that 
faster speciation is actually a property of deep-sea environments and not high 
latitudes. To test this hypothesis, we obtained depth classifications for marine 
fishes from FishBase (http://www.fishbase.org); minimum and maximum depths 
were available for 4,089 species (of 5,231 total species). We used ES-SIM to test 
the relationship between latitude and speciation Ap for fishes with minimum 
depth >200 m. Using FiSSE, we tested whether speciation rates were faster for 
low-latitude deep-sea fishes relative to low-latitude shallow or reef-associated 
species (Extended Data Fig. 7a). 
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Additional checks on statistical robustness. We performed several additional 
checks on the robustness of the latitude-speciation correlation. We visualized lat- 
itudinal trends in terminal branch length, which is expected to correlate inversely 
with underlying speciation rate. We obtained estimates of the mean terminal 
branch length of each species from the distribution of ATA (31,526 taxon) phy- 
logenies. The inverse of these branch lengths is the simplest possible estimate of 
the instantaneous rate of speciation, although it is an extremely noisy metric; App 
is similar but includes the weighted contribution of earlier branches to increase the 
signal-to-noise ratio. Despite the overall noisiness of the terminal branch length 
metric, there is a clear trend towards shorter terminal branches for high-latitude 
taxa (Extended Data Fig. 9). 

We also tested whether our speciation rate estimates could have been driven 
by latitudinal gradients in genetic taxon sampling, as might be the case if a higher 
percentage of high-latitude taxa had DNA sequences with which to infer their 
phylogenetic position without relying on stochastic polytomy resolution. To for- 
mally address this potential confounding variable, we fitted multiple regression 
models to the relationship between lineage-specific speciation rate and latitudinal 
midpoint, but including the sampling fraction for each species as a covariate. The 
sampling fraction for each taxon was simply the percentage of total species from 
the corresponding family-level clade that contained genetic data (for example, the 
percentage of total species from each family that were represented in the genetic 
supermatrix). These sampling fractions were identical to those used to correct 
for incomplete sampling in the BAMM analyses. Visual inspection and formal 
analysis shows minimal effect of sampling fraction on the patterns reported here 
(Extended Data Fig. 9). 

Finally, we tested whether variation in the rate of molecular evolution could 
drive spurious variation in the rate of diversification. A systematic bias towards 
low rates of molecular evolution can lead to apparent fast rates of diversification 
in slowly evolving lineages as an artefact of the algorithms used for time-scaling 
the raw (uncalibrated) phylogeny. If our results are affected by this bias, we expect 
to observe (i) a general trend towards slower rates of molecular evolution at high 
latitudes, and (ii) a negative correlation between speciation rate and the rate of 
molecular evolution. To estimate rates of molecular evolution, we computed the 
relative root-to-tip distances for each taxon in the phylogeny and estimated their 
correlation with both latitude and Apr (Extended Data Fig. 9). There is no evidence 
that higher latitudes are associated with slower rates of molecular evolution, or 
that rates of molecular evolution are negatively correlated with Apr (see Extended 
Data Fig. 9). 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. All scripts and code necessary to repeat the analyses described 
here have been made available through the Dryad digital data repository (https:// 
doi.org/10.5061/dryad.fc7 1cp4). 

Data availability. All data necessary to repeat the analyses described here have 
been made available through the Dryad digital data repository (https://doi. 
org/10.5061/dryad.fc71cp4). Phylogenetic tree distributions are also available 
through http://fishtreeoflife.org. 
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A+E+L+P Anguilliformes. 


Osmeriformes 


B+H Labriformes 


Gobiiformes 


Extended Data Fig. 1 | Phylogenetic placement of fossil calibrations 
in major fish lineages. Major lineages are broken into subclades 

(top) to visualize fossil calibrations and are coloured by taxonomic 
order. Numbered nodes are described in the calibration report in the 
Dryad data repository. The same calibrations are red circles in the full 
phylogeny (bottom). A + E+ L-+ P: Acipenseriformes, Elopiformes, 
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Osteoglossiformes Clupeiformes 


Siluriformes 


Lampridiformes 


e 


Tetraodontiformes 


Lepisosteiformes, Polypteriformes; A + E + S: Argentiniformes, 
Esociformes, Salmoniformes; B + H: Beryciformes, Holocentriformes; 
C+S + P: Centrarchiformes, Scombriformes, Perciformes; C + U: 
Chaetodontiformes, Uranoscopiformes; G + G: Gonorynchiformes, 
Gymnotiformes; G + O + S: Galaxiiformes, Osmeriformes, 
Stomiatiformes; P + Z: Percopsiformes, Zeiformes. 
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AIC2 (breakpoint) breakpoint slope p SAR.| SAR. Ip OLS.| OLS. Ip 
—1306.396 40 0.003 < 0.001 —0.011 0.546 0.55 < 0.001 
—1072.144 27 0.003 < 0.001 0.015 0.357 0.525 < 0.001 
—1250.247 34.5 0.003 < 0,001 0.011 0.385 0.544 < 0.001 
—1275.119 41.5 0.004 < 0.001 —0.048 0.778 0.525 < 0.001 
—967.732 33.5 0.004 < 0.001 0.045 0.187 0.521 < 0.001 
—1164.07 35 0.003 < 0.001 0.021 0.322 0.442 < 0.001 
AIC (SAR) OLS.slope SAR.slope p slope ratio SAR.| SAR. Ip OLS.| OLS. Ip 
—1060.477 0.0002 0.0002 0.0125 0.076 0.052 0.115 0.318 < 0.001 
—778.44 0.0002 < 0.0001 0.7473 0.014 0.032 0.285 0.495 < 0.001 
—1014,502 0.0002 0.0001 0.0856 0.044 0.017 0.365 0.334 < 0.001 
—1083.7 < 0.0001 0.0001 0.2089 0.028 0.028 0.267 0.386 < 0.001 
—805.78 0.0003 < 0.0001 0.6231 0.022 0.001 0.453 0.499 < 0.001 
—950.599 < 0.0001 < 0.0001 0.3601 —0.034 —0.012 0.527 0.465 < 0.001 


Extended Data Fig. 2 | Relationships between mean speciation rates 
and latitude for 262 marine ecoregions using alternative methods 

for the computation of the cell rates. a~c, Azam versus latitude. 

d-f, Apr versus latitude. Ecoregion rates are mean rates across all cells 
assigned to each biogeographical region. Arithmetic mean is the mean 
rate across all taxa inferred to occur in the cell; weighted arithmetic 

and weighted geometric means assign proportionately greater weight to 
species with small geographical ranges. Weighting schemes for speciation 
metrics are described in the Methods. g, Simultaneous autoregressive 
(SAR) spatial error models for the effects of absolute latitude on mean 
speciation rates for ecoregions. AIC1 gives the Akaike information 
criterion (AIC) for a linear model with a single slope and intercept term; 
AIC2 is the corresponding AIC for a breakpoint model that assumes 

no relationship (slope = 0) between absolute latitude and speciation 


rate for all values below some threshold, and a linear relationship for 
latitudes that exceed the threshold. SAR.IJ and SAR.Ip are global Moran's 

I estimates and associated P values for assessing the presence of residual 
spatial autocorrelation in the model residuals; OLS.J and OLS. Ip are the 
corresponding values for ordinary least squares (OLS) regression that 
ignores spatial autocorrelation. All SAR models show highly significant 
effects of latitude on speciation rate, and breakpoint models provided 

a consistently better fit than models without a breakpoint. h, OLS and 
SAR models for the effects of absolute latitude on speciation rate for low- 
latitude grid ecoregions only. The slope ratio term gives the ratio of slopes 
for low-latitude ecoregions (below the corresponding breakpoint; g) to the 
slope for ecoregions with latitude above the breakpoint. Overall, there is a 
marginal effect of latitude on speciation rate for low-latitude ecoregions. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a all species Apr b all species Agamm-tc c endemic species richness 


0.09 0.16 0.23 0.31 0.08 


0.15 0.22 0.29 8 1,021 2,033 3,094 


| ————— | 


Latitude (ecoregion centroid) 


f metric AIC1 AIC2 (breakpoint) 
pawn : unweighted arithmetic mean —692.771 —703.745 
Asam : Weighted mean, arithmetic —598.557 —609.271 
Asam : weighted mean, geometric —692.316 —707.397 
Apr : unweighted arithmetic mean —575.146 —579.98 
Aor : weighted mean, arithmetic —471.397 —479.062 
Apr : weighted mean, geometric —556.135 —568.005 


Extended Data Fig. 3 | Relationships between speciation rate and 
latitude for alternative speciation rate metrics and for endemic taxa 
only. a, b, Global maps of Apg and Agamm-to, as in Fig. 1. ¢, Global 

map of endemic species richness, by grid cell. ‘Endemic’ taxa are those 
that are restricted to a single MEOW realm; an endemic taxon can 
occur in multiple grid cells provided all grid cells are contained within 
a single realm. d, Relationship between speciation rates (Apr) and 
latitude for ecoregions (n = 232), computed using realm endemics only. 


e 0.5 
0.4 
2 0.3 
= 
= 
=< 
oO 
= 0.2 
0.1 
0 
Latitude (ecoregion centroid) 
breakpoint slope p SAR.I SAR. lp OLS.I OLS.|p 
27.5 0.003 < 0.001 —0.001 0.47 0.566 < 0.001 
27.5 0.003 < 0.001 0.013 0.389 0.493 < 0.001 
32 0.003 < 0.001 0.012 0.392 0.462 < 0.001 
37 0.003 < 0.001 —0.09 0.898 0.622 < 0.001 
275 0.004 < 0.001 —0.012 0.537 0.46 < 0.001 
31.5 0.004 < 0.001 —0.002 0.475 0.327 < 0.001 


e, Relationship between speciation rates (Agamm-tc) and latitude for 
ecoregions, computed using realm endemics only. f, SAR spatial error 
models for the relationship between ecoregion speciation rates and 
absolute latitude, for which ecoregion means are computed from single- 
realm endemics only. Weighting schemes for assemblages are described 
in the Methods. SAR modelling results are presented as in Extended Data 
Fig. 2g and show a strong correlation between latitude and speciation rate. 
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e Description Endemicity measure Speciation measure r p 
Speciation rate by endemism Occupancy (provinces) ABAMM 0.79 < 0.001 
Speciation rate by endemism Occupancy (provinces) Apr 0.79 < 0.001 
Speciation rate by endemism Occupancy (provinces) ABAMM—TC 0.79 < 0.001 
Speciation rate by endemism Range size ABAMM 0.57 < 0.001 
Speciation rate by endemism Range size Aor 0.61 < 0.001 
Speciation rate by endemism Range size ABAMM—TC 0.56 < 0.001 
Endemism by absolute latitude Occupancy (provinces) - 0.58 < 0.001 
Endemism by absolute latitude Range size = 0.46 < 0.001 
Extended Data Fig. 4 | Speciation rate, species richness, temperature endemism. ‘Occupancy (provinces) measures endemism as the inverse 
and endemism. a, Negative relationship between species richness of the mean number of provinces occupied by each taxon that occurs in 
and mean speciation rate (App) for individual grid cells. b, Negative a particular province. ‘Range size’ is the inverse mean range size across 
relationship between mean annual sea surface temperature and mean all taxa occurring in a given province. High values of endemism indicate 
speciation rate. c, d, Same as a, b, but for BAMM with time-constant that a given region consists of species that are found in fewer additional 
rate regimes (Agamm-tc). Grid cells as in Fig. 1 (1 = 16,150). See Fig. 2 provinces, or of species with smaller geographical ranges. The bottom 
for comparison. e, Correlation between mean speciation rate for two rows show the correlations between the endemism parameters and 
MEOW biogeographical provinces and two measurements of regional latitude. 
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Extended Data Fig. 5 | Speciation rates for individual taxa as a function 


of latitudinal midpoint. a, Ap, for all marine species with genetic data 
(n= 5,229) as a function of the latitudinal (centroid) midpoint of their 
geographical range. Non-phylogenetic OLS regression with quadratic 


term is overlaid on points to denote trend in mean rates. b, Apamm for the 
same taxon set. c, Sliding window analysis of Apr distributional quantiles 
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in speciation rates by individual taxa with respect to latitudinal midpoint. 
Contours denote quantiles from 0.10 to 0.90, in 0.10 increments, with 

a sliding window size of 6°. Dark red line is the median rate. 

d, Distributional quantiles of Agamm for all species with respect to 
latitudinal midpoint; dark red line is median rate. 
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Extended Data Fig. 6 | Temporal dimension of speciation rate variation _over-time curves reconstructed using the time-varying rates model in 


as a function of latitude. a, Mean speciation rates for taxa from low BAMM for 14 clades of high-latitude (blue) and low-latitude (red) fishes. 
latitudes (<30°), intermediate latitudes (30-60°) and high latitudes (>60°) Inset numbers for each panel give the numbers of low-, intermediate- 
computed using the interval method. Per-taxon interval-based rates were and high-latitude (from left to right) taxa from each clade for which 
computed for time intervals between 0.25 and 50 million years before geographical range data are available. Low-latitude clades were selected to 
present. Time-averaged speciation rates for high-latitude fishes are much represent high-diversity and iconic reef-associated clades that contribute 
higher than those inferred for low-latitude fishes, even across timescales substantially to the tropical diversity peak in marine fishes. With the 

that exceed 20 million years. b, Rate differential between high-latitude possible exception of gobies, there is no signal of early, rapid speciation in 
and low-latitude taxa as a function of interval duration. c, Speciation- low-latitude or tropical shallow-water clades. 
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a 
Classification Nehallow Ndeep DReghallow DRaeep p 
Mean Depth (200 m) 2273 517 0.11 0.12 0.91 
Mean Depth (500 m) 2498 292 0.11 0.12 0.91 
Mean Depth (1000 m) 2637 153 0.1 0.12 0.9 
Categorical: reef vs bathyal 1922 572 0.11 0.13 0.86 
Minimum depth > 200 m 2677 179 0.12 0.123 0.86 
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Extended Data Figure 7 | Speciation rates in deep-sea fishes and the 
phylogenetic structure of high-latitude fish diversity. a, Formal test 

of the relationship between speciation rate and depth classification for 
tropical fishes. ‘Classification’ is the criterion used to define fishes 

as deep sea versus shallow water; mean depth (200 m) thus classifies all 
fishes with mean depth greater than 200 m as deep sea. Among tropical 
fishes, there is no effect of depth state on speciation rate. b, Phylogenetic 
composition of high-latitude fish diversity by taxonomic order, across 
all marine fishes (top) and for the subset of species with genetic data 
(bottom). High latitude is defined as having a centroid midpoint greater 
than 45° north or south. Only the three most species-rich high-latitude 
orders are labelled. Most high-latitude marine fishes are Perciformes. 

c, Phylogenetic and geographical structure of the diversity of Perciformes. 
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The latitudinal range of each perciform species in the phylogenetic 
dataset is shown, along with the corresponding speciation rate (Agamm)- 
Latitudinal ranges from species with speciation rates that are faster and 
slower than the median rate are shown in red and blue, respectively. High- 
latitude and rapidly speciating clades are nested within slowly speciating 
tropical lineages, and speciation rates for high-latitude taxa of Perciformes 
are higher than those observed in tropical lineages. Mean speciation rates 
for high-latitude species (>45°, n = 376) are faster than those observed for 
tropical (<25°, n = 287) species (tropical: App =0.16, Asamm = 0.15; high 
latitude: Apr = 0.30, Anam = 0.23). For polar species (>60°, n= 105), 
these rate differentials are even more extreme, with mean \pr = 0.38 and 
AsamM = 0.31. 
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Extended Data Figure 8 | Latitudinal gradient in speciation rate for 
cell assemblages inferred from occurrence data. Cell assemblages 

(n= 843) and species latitudinal midpoints were inferred from a non- 
redundant merge of four primary occurrence-based biodiversity databases 
(GBIE, OBIS, Fishnet2 and VertNet). a, \gamm for cell assemblages as a 
function of latitude. b, Apr as a function of latitude. c, SAR spatial error 
models for the effects of absolute latitude on mean speciation rates for 
grid cells. AIC] is a linear model with a single slope and intercept term; 
AIC2 is the corresponding AIC for a breakpoint model that assumes no 
relationship (slope = 0) between absolute latitude and speciation rate for 
all values below some threshold, and a linear relationship for latitudes 
that exceed the threshold. All other column headings as in Extended 
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Data Fig. 2g. Results indicate a strong effect of latitude on speciation 

rate and are nearly identical to results obtained using the dataset of the 
primary map. d, Effects of absolute latitudinal midpoint for individual 
taxa on corresponding tip speciation rates, as assessed using FiSSE. Each 
row gives the results of FiSSE using a different threshold for classifying 
lineages as tropical and temperate. Ao and \ denote estimated speciation 
rates (similar to Apr) for tropical and temperate lineages, respectively. 
All column headings are identical to those shown in Extended Data 
Table 1. Results are nearly identical to those obtained using explicit range 
reconstructions and reveal a pervasive effect of latitude on lineage-level 
speciation rates, regardless of the threshold used to classify species. 
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Extended Data Fig. 9 | Additional checks of statistical robustness. 

a, Relationship between terminal branch lengths and absolute latitudinal 
midpoint; means are shown for all species falling into a given bin 

(+2.5° from the focal value, n = 15). Mean branch lengths decrease 

with increasing latitude, reflecting faster speciation at high latitudes. 

b, Relationship between the estimated speciation rate for each taxon 
(Apr, 1 = 5,155) and the sampling fraction for the corresponding family- 
level clade to which the taxon belongs; the sampling fraction is simply 
the percentage of known taxa from the family that were represented in 
the phylogenetic dataset with genetic data. There is no clear relationship 
between the sampling fraction and the estimated speciation rates. 

c, Multiple regression analysis (OLS) of the relationship between taxon- 
specific speciation rate (Agamm or Apr) and two predictors (latitude 

and family-level sampling fraction) in a multiple regression framework 
(n=5,155). If the relationship between speciation rate and latitude is 
driven by progressively greater (or lower) genetic taxon sampling as a 
function of latitude, the sampling fraction term should explain a large 


P-value 
255.96 < 0.0001 
1.45 0.048 
206.71 < 0.0001 
2.74 0.021 
Tropical taxa f High latitude taxa 


(log) Abr 
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fraction of the overall sums of squares. Even when sampling fraction is 
included as a covariate, the overwhelming fraction of variance is explained 
by latitude. For both Apr and Agamm, More than 98% of the total sums 

of squares is explained by latitude and not sampling. d-f, Test for the 
effects of molecular evolutionary rate variation and latitudinal bias in 
speciation rate. d, Relationship between root-to-tip branch length sum for 
uncalibrated (non-ultrametric) RAXxML phylogeny and midpoint latitude 
for each marine taxon (n =5,149). e, f, Relationship between root-to-tip 
distance and Apr. There is effectively no relationship between the total 
path length for individual tips and their absolute latitudinal midpoint 
(Pearson r= 0.020). Plots in e and f emphasize tropical (midpoint latitude 
<25°; n = 3,481; red) and temperate-polar (midpoint latitude >45°; 

n= 567; blue) taxa, respectively, all other taxa are shown in grey. Overall 
relationship between (log) pr and the rate of molecular evolution (root- 
to-tip sum) is weak but positive (Pearson r= 0.130) and inconsistent with 
the hypothesis that slow rates of molecular evolution at high latitudes 
results in fast but spurious estimates of speciation rate. 
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Extended Data Table 1 | Effects of absolute latitudinal midpoint on speciation rates 


Threshold M1 Xo p Null Aye Null o Nparsimony parsimony 
23.5 0.119 0.187 < 0.001 0.00122 0.022 600 0.007 
25 0.12 0.19 < 0.001 —0.00098 0.022 576 0.007 
30 0.122 0.2 < 0.001 —0.00035 0.023 468 0.005 
35 0.126 0.205 < 0.001 —0.00007 0.025 388 0.004 
40 0.128 0.223 < 0.001 0.00043 0.028 263 0.002 
45 0.129 0.246 < 0.001 0.00039 0.03 201 0.002 
50 0.132 0.267 < 0.001 0.00034 0.031 185 0.002 
55 0.136 0.275 < 0.001 0.00174 0.034 132 0.001 
60 0.137 0.353 < 0.001 0.00061 0.041 74 0.001 


The effect of latitude on diversification was assessed using FiSSE, a method for inferring the effects of a binary character on lineage diversification rates. Each row gives the results of FiSSE using a 
different threshold for classifying lineages as tropical and temperate. Ao and A; denote estimated speciation rates (similar to Apr) for tropical and temperate lineages, respectively. P values indicate 
the proportion of simulations with a rate difference (Ai — \o) that is greater than the observed difference (Anu). The number of parsimony-reconstructed changes between states O and 1 is given by 
Nparsimony: denotes the empirically estimated transition rate used to generate the null distribution. Results are based on 2,000 simulations; the observed difference in rates exceeded all simulated 
values, regardless of threshold. 
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Hot streaks in artistic, cultural, and scientific 


careers 


Lu Liu)?, Yang Wang!?, Roberta Sinatra*>°, C. Lee Giles*”, Chaoming Song® & Dashun Wang!??* 


The hot streak—loosely defined as ‘winning begets more 
winnings —highlights a specific period during which an individual's 
performance is substantially better than his or her typical 
performance. Although hot streaks have been widely debated in 
sports”, gambling** and financial markets’ over the past several 
decades, little is known about whether they apply to individual 
careers. Here, building on rich literature on the lifecycle of 
creativity®*, we collected large-scale career histories of individual 
artists, film directors and scientists, tracing the artworks, films 
and scientific publications they produced. We find that, across 
all three domains, hit works within a career show a high degree of 
temporal regularity, with each career being characterized by bursts 
of high-impact works occurring in sequence. We demonstrate that 
these observations can be explained by a simple hot-streak model, 
allowing us to probe quantitatively the hot streak phenomenon 
governing individual careers. We find this phenomemon to be 
remarkably universal across diverse domains: hot streaks are 
ubiquitous yet usually unique across different careers. The hot 
streak emerges randomly within an individual’s sequence of works, 
is temporally localized, and is not associated with any detectable 
change in productivity. We show that, because works produced 
during hot streaks garner substantially more impact, the uncovered 
hot streaks fundamentally drive the collective impact of an 
individual, and ignoring this leads us to systematically overestimate 
or underestimate the future impact of a career. These results not 
only deepen our quantitative understanding of patterns that govern 
individual ingenuity and success, but also may have implications for 
identifying and nurturing individuals whose work will have lasting 
impact. 

According to the Matthew effect®”*4, victories bring reputation and 
recognition that can translate into tangible assets, which in turn help to 
bring future victories. This school of thought supports the existence of a 
hot streak in a career, which is also consistent with literature in the field 
of innovation showing that peak performance clusters in time, typically 
occurring around the middle of a career*!"!. On the other hand, the 
random impact rule uncovered in the arts!°! and sciences!°'8 predicts 
the opposite: the best works occur randomly within a career, and their 
occurrence is primarily driven by productivity. Following this school 
of thought, works after a major breakthrough are not affected by what 
preceded them, supporting the viewpoint of regression towards the 
mean. The two divergent schools of thought raise a fundamental ques- 
tion: do hot streaks exist in creative careers? 

To answer this question, we collected data sets recording the 
career histories of individual artists, film directors and scientists 
(Supplementary Information $1) and traced the impact of the artworks, 
films and papers they produced, approximated by auction prices", 
IMDB ratings (https://www.imdb.com/)” and citations garnered after 
10 years of publication (C)9)!>"%18°, respectively (see Methods). We 
started by investigating the timing of the three most impactful works 


produced in each career. In a sequence of N works by an individual, 
we denoted with N* the position of the highest-impact work within a 
career, N** the second highest and N*** the third. We found that each 
of the three highest-impact works was randomly distributed among 
all the works produced by an individual (Extended Data Fig. la-c), 
offering strong endorsement for the random impact rule'®'8?!. 

However, as we show next, the randomness in individual creativity 
is only apparent, because the timing between creative works follows 
highly predictable patterns. We measured the correlation between the 
timing of the two biggest hits within a career, and compared it with a 
null hypothesis in which N* and N** each occured at random. The nor- 
malized joint probability, 6(N*, N**) = P(N*, N**)/(P(N*)P(N**)), is 
substantially overrepresented along the diagonal elements of matrices 
(Fig. la-c), demonstrating that N* and N** are much more likely to 
colocate with each other than would be expected from the random 
impact model across three domains. The diagonal pattern disappears 
if we shuffle the order of works within each career, thereby breaking 
the temporal correlations (Extended Data Fig. 1j-r). 

To quantify the temporal colocation of hits observed in Fig. la-c, 
we calculated the distance between the two highest-impact works 
for every individual, measured by the number of works produced in 
between, AN = N*—N**, We compared P = of real careers with 


R(=*) of shuffled careers by defining R( AN ) = P( aN y/ Ri = ) For 


artists, directors, and scientists, all R(S* exhibit a clear peak centring 


around zero and decay quickly as AN deviates from zero (Fig. 1d-f). 
Notably, R=) is mostly symmetric around zero (Fig. 1d-f), indi- 


cating that the biggest hit is equally likely to arrive before or after the 
second biggest. The colocation patterns are not limited to the two highest- 
impact works within a career. We repeated our analyses for other pairs of 
hit works, such as N** versus N*** and N* versus N****, and uncovered 
the same colocation patterns (Extended Data Fig. 1d-i). 

Do high impact works come in streaks within a career? We counted 
the number of consecutive works whose impact exceeded the median 
of all works within a career (Extended Data Fig. 2d-f). We calculated 
the length of the longest streak L for each career. We then shuffled the 
order of works within each career, and measured again their longest 
streaks L;. P(L) was characterized by a much longer tail than P(L,) 
(Fig. 1g-i), indicating that real careers are characterized by long streaks 
of relatively high-impact works clustered together in sequence. We 
tested the robustness of these results by controlling for individual career 
length, and by varying our threshold used to calculate L, and arrived 
at the same conclusions (Extended Data Figs. 2-4, Supplementary 
Information S2). Together, these results raise an important question: 
what mechanisms are responsible for the temporal regularities observed 
across diverse career histories? 

Let us first consider a null model in which the goodness of works 
produced in a career (that is, log(price) for artists, ratings for directors, 
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Fig. 1 | Hot streaks in artistic, cultural and scientific careers. a—c, d(N*, 
N**), colour coded, measures the joint probability of the two highest- 
impact works within a career for artists (a), directors (b), and scientists (c). 
@> 1 indicates that two hits are more likely to colocate than would be 
expected at random. d-f, R( 4%) measures the temporal distance between 
highest-impact works relative td the null model’s prediction. Real careers 
show a clear peak around 0 (red dots), which is well captured by the hot- 
streak model (solid lines). Different shades of red correspond to different 
pairs of hit works. Blue dots denote the same measurement but on shuffled 
careers, and blue lines are predictions from shuffled careers generated by 
our model. g-i, The distribution of the length of streaks P(L) for real 
careers and P(L,) for shuffled careers. The hot-streak model (red lines) and 
its shuffled version (blue lines) closely reproduce P(L) observed in real 
(red dots) and shuffled careers (blue dots). 


and log(Cjo) for scientists) is drawn from a normal distribution 
NO; o;) that is fixed for an individual. The average I’; characterizes 
the typical impact of works produced by the individual, and o; captures 
the variance. This null model can reproduce the fact that each hit 
occurs randomly within a career'®!8, However, it fails to capture any of 
the temporal correlations observed in Fig. 1. The main reason for this 
failure is illustrated in Fig. 2a—c, where we selected for illustration pur- 
poses one individual from each of the three data sets and measured the 
dynamics of I’; during his or her career. We find that I’; is not constant 
throughout a career. Rather, it deviates from a baseline performance 
((o) at a certain point in a career (t;), elevating to a higher value 'y 
(C4 >), which is then sustained for some time before falling back to 
level similar to '9 (Fig. 2a-c): 


Te Harsh 
I(t) = oe (1) 
I) otherwise 


This observation, combined with the shortcomings of the null model, 
raises an intriguing question: could a simple model based on equation 
(1) explain the temporal anomalies documented in Fig. 1? 

To test this hypothesis, we applied equation (1) to real productivity 
patterns, allowing us to generatively simulate the impacts of the works 
produced by an individual (Supplementary Information $3.3). During 
the period in which ['y operates, the individual seemingly performs at 
a higher level than his or her typical performance (Io), prompting us to 
call this model the hot-streak model (where the I} period corresponds 
to the hot streak). We introduced to each career one hot streak that 
occured at random with a fixed duration and magnitude, and repeated 
our measurements in Fig. 1 on careers generated by the model. We find 
that, whereas equation (1) introduces only a simple temporal variation, 
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Fig. 2 | The hot-streak model. a—c, ['(N) for one artist (a), film director (b) 
and scientist (c), selected for illustration purposes. See Extended Data 

Fig. 9 for randomly selected careers. d—-f, Histogram of the number of 

hot streaks in a career. We also measured several performance metrics 

for individuals who had one or two hot streaks, and found no detectable 
difference (Extended Data Fig. 10). g-i, The distributions of durations 

of hot streaks P(7}). Red lines are log-normal fits as a visual guide. The 
insets show cumulative distributions P eal , indicating that the start of a 


hot streak N; is distributed randomly among N works in a career. j-l, The 
distributions of the number of works produced during hot streaks P(Ny), 
compared with a null distribution in which we randomly pick one work as 
the start of the hot streak. j, Artists (1 = 3,166). k, Directors (n = 5,098). 

1, Scientists (n = 18,121). Two-sided Kolmogorov—Smirnov tests indicate 
that we cannot reject the hypothesis that the two distributions are drawn 
from the same distribution (P= 0.12 for artists, P=0.12 for directors, and 
P=0.17 for scientists). 


the hot-streak model is sufficient to reproduce all empirical patterns 
observed in Fig. 1 (Fig. 1d-i and Extended Data Fig. 1s—u). Given the 
myriad factors that can affect career impacts? 12-1822,27-30, and the obvi- 
ous diversity of careers we studied, the level of universality and accuracy 
demonstrated by the simple hot-streak model was unexpected. 

The real value of the model arises, however, when we fit the model to 
real careers to obtain the individual specific To, Py, t; and t, parameters 
(Supplementary Information 3.4), helping us to reveal several funda- 
mental patterns that govern individual careers. 

1. Hot streaks are ubiquitous across careers, yet at the same time usu- 
ally unique within a career. The vast majority of artists (91%, Fig. 2d), 
film directors (82%, Fig. 2e) and scientists (90%, Fig. 2f) have at least 
one hot streak during their careers, documenting the practical rele- 
vance of the uncovered hot streak phenomenon. However, despite its 
ubiquity, the hot streak is likely to be unique within a career. Indeed, 
when we relaxed our fitting algorithm to allow for multiple hot streaks 
(up to three) with different values of I'y, we found that, among those 
who hada hot streak, 64% of artists, 80% of directors, and 68% of scien- 
tists were best captured by one hot streak only (Fig. 2d-f), documenting 
the precious nature of hot streaks. Second acts may occur but are less 
likely, particularly for film directors. Occurrences of more than two hot 
streaks are rare across all careers. 
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career. a, Citation patterns of papers published by a randomly selected 
scientist in our data set. The publication dates are rearranged such that 
the individual produces a constant number of papers each year (coloured 
vertical lines, see Methods). Inset, collective impact of the individual, 
representing the sum of citation dynamics of all of his or her papers. 

b, Varying hot-streak parameters allows us to reproduce a wide variety 
of career dynamics (red lines) that cannot be captured by the null model 
(blue line). For hot-streak parameters in equation (4) (Methods), here 
we use jt = 7.0, 7= 1.0, Py = 1.0 and Ty =3 years, but vary t; and ['y. 
Inset, g(t) can be decomposed into go(t) and Ag(t). ¢, g(t) of 50 randomly 
selected scientists. Colour corresponds to a career’s starting year, dots 
denote real data, and solid lines capture the predictions from the 
hot-streak model. 


2. The hot streak occurs randomly within a career. We estimate the 
beginning of hot streaks, by measuring N}, the position of work pro- 
duced when a hot streak starts (¢;). We find that, across artistic, cul- 
tural, and scientific careers, N; is randomly distributed in the sequence 
of N works within a career (Fig. 2g-i, insets). This finding reconciles 
two seemingly divergent schools of thought”!*8, providing a further 
explanation for the random impact rule: if the hot streak occurs ran- 
domly within a career, and the highest impact works are statistically 
more likely to appear within a hot streak, then the timing of the highest 
impact works is also random. 

3. Across different domains, hot streaks are considerably shorter than 
the typical career length recorded in our database. We measure the 
duration distribution of hot streaks (7 = t;—t)), finding P(7) peaks 
around 5.7 years for artists, 5.2 years for directors, and 3.7 years for sci- 
entists, which is largely independent of when it occurs within a career 
(early, mid or late career; Fig. 2g-i). 

4. Unexpectedly, individuals are not more productive during hot 
streaks. We measured the distribution of the total number of works 
produced during hot streaks P(Ny). We then constructed a null distri- 
bution, by randomly picking one work in a career and designating its 
production year to be the start of the hot streak. We found that the two 
distributions aligned well with each other (Fig. 2j-l). Therefore, indi- 
viduals show no detectable change in productivity during hot streaks, 
despite the fact that their outputs in this period are significantly better 
than the median, suggesting that there is an endogenous shift in indi- 
vidual creativity when the hot streak occurs. For additional properties 
of hot streaks, see Methods and Extended Data Fig. 5. 

To investigate the impact of hot streaks on individual careers, we 
focused on scientific careers and measured the collective impact of a 
scientist, g(t), defined as the total number of citations over time col- 
lected by all papers published by an individual (Fig. 3a). g(t) can be 
derived analytically by combining the hot-streak model (equation (1)) 
and an existing model'® for the citation patterns of papers (see Methods 
and Supplementary Information $5), consisting of two terms: 


g(t)=g,(t) + Ag(t) (2) 
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go(t) captures a career’s collective impact in the absence of a hot streak 
(that is, '(t) =I). Contributions from the hot streak are encoded 
in Ag(t), driven by both the timing and magnitude of hot streaks 
(see Methods). Varying hot-streak parameters leads to substantial 
changes in the collective impact of a career (Fig. 3b). Hence the hot- 
streak model captures a wide range of impact trajectories that are 
followed by real careers (Fig. 3c), and the accuracy of the model is 
documented by several metrics (see Methods). Given that individuals 
improve substantially during hot streaks, the uncovered phenomena 
can be particularly crucial for understanding the long-term impact of 
a career (Extended Data Fig. 6). 

We further tested several alternative hypotheses, each associated 
with possible origins of the uncovered hot streaks (see Methods and 
Supplementary Information $6). Of all hypotheses considered, the hot- 
streak model is the simplest and least flexible. However, it is the only 
model whose predictions are consistent with real careers (Extended 
Data Figs 7, 8). The identification of the true origins of hot streaks is 
beyond the scope of this work. As such, the hot streaks uncovered here 
should be treated in a metaphorical sense, highlighting an intriguing 
period of outstanding performance during individual careers without 
implying any associated drivers. Crucially, though, the findings pre- 
sented here hold the same, regardless of the underlying drivers. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


Data description. We compiled three large-scale data sets of individual careers 
across three major domains involving human creativity. The first data set (D1) 
consists of auction records curated from online auction databases, allowing us to 
reconstruct the career histories of 3,480 artists through the sequence of works they 
each produced, together with the impacts of the artworks, approximated by ham- 
mer prices in auctions!>. D, contains profiles of 6,233 film directors recorded in the 
IMDB database, each career being represented by the sequence of films directed 
by the individual. As metrics that quantify the impact of a film correlate closely 
with each other”, here we use the IMDB rating to measure the goodness ofa film. 
Finally, our third data set (D3) includes publication records of 20,040 individual 
scientists through a large-scale name disambiguation effort that combined the Web 
of Science (https://clarivate.com/products/web-of-science/) and Google Scholar 
(https://scholar.google.co.uk/) data sets. The impact of each paper is measured by 
citations garnered after 10 years of its publication!>!©!86 (C,9). Further details 
on data collection and curation are provided in Supplementary Information S1. 

To study the impact of works across the three domains, we measured the dis- 

tributions of hammer prices, IMDB ratings and paper citation counts in our data 
sets. Both hammer price (D,) and Cj (D3) follow fat-tailed distributions, well 
approximated by a log-normal function (Extended Data Fig. 2a, c), and the IMDB 
rating follows a normal distribution ranging between 1 and 10 (Extended Data 
Fig. 2b). To make sure Co is not affected by citation inflation’**3!, we also meas- 
ured a rescaled Cio (see Supplementary Information $1.3) and found that it also 
followed a fat-tailed distribution (Extended Data Fig. 2c, inset). Therefore, we take 
the logarithmic of hammer price and Cjo (log(price) and log(Cj)) to approximate 
the goodness of an artwork and scientific publication. Note that the choice of loga- 
rithmic for hammer prices and Cj is meant to be consistent with prior studies'*', 
and does not affect any of the conclusions of the paper. Indeed, the logarithmic 
function is a monotonically increasing function, hence it does not change the 
rank ordering of top hits in a career. Note that while the data sets we used in this 
paper cover a large collection of career histories across a wide range of domains, 
the data-driven nature of our study indicates that the scope of our data is limited to 
individuals who have had sufficiently long careers to provide enough data points 
for statistical analyses (Supplementary Information S1). 
Additional properties of hot streaks. How much does an individual deviate from 
his or her typical performance during a hot streak? Do people with higher Pp also 
experience more performance gain from hot streaks? We explored correlations 
between Ip and I'y, finding them to be well approximated by a linear relationship 
(Extended Data Fig. 5a—c). Hence, individuals with better typical performance also 
perform better during their hot streaks. It is interesting to note that the coefficients 
are slightly less than 1 (Extended Data Fig. 5a-c). Hence AT = I'y — Ip decreases 
with Po (Extended Data Fig. 5a-c, insets), suggesting that individuals with smaller 
Ty benefit more from hot streaks. These results are again independent of when the 
hot streak occurs along a career (Extended Data Fig. 5a—c). 

The temporally localized nature of the hot streak is also captured by its propor- 

tion over career length 7;/T (Extended Data Fig. 5d-f). We compared the duration 
of hot streaks with typical career length, finding that the median hovers around 
20% (0.17 for artists, 0.23 for directors, and 0.20 for scientists). 
Analytical solutions for the collective impact of a scientific career, g(f). Brought 
into the spotlight by popular websites such as Google Scholar, g(t) is playing 
an increasingly important role in driving many critical decisions, from hiring, 
promotion and tenure to awarding of grants and rewards. Many factors are 
known to influence it, ranging from productivity!” to citation disparity and 
dynamics!*!*16.2329 and temporal inhomogeneities along a career! )17182122:30, 
As our goal is to understand impact, here we bypass the need to evaluate the 
inhomogeneous nature of productivity'”"* by rearranging the publication time 
of each paper, such that an individual produces a constant number of papers 
each year, denoted by n (Fig. 3a). To calculate g(t), we need to incorporate the 
citations patterns of papers into our hot-streak model (equation (1)). A recent 
study!® suggested that the citation dynamics of a paper published at time f can 
be approximated by 


In(t—to)— pu 
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where m is a global parameter describing the typical number of references a paper 
contains, and ®(-) is the cumulative normal function, characterized by ju and o, 
which capture the typical citation life cycle of a paper. The paper’s ultimate impact 
is determined by its fitness!°, X. To adapt equation (3) into our framework, we 
replace \ with ['(to), and for simplicity assume that ju and o are fixed for different 
papers published by an individual. The resulting model, combining equations (1) 
and (3), can be solved analytically (Supplementary Information $5), allowing us 
to express g(t) in terms of hot-streak parameters: 


[AS] 
g(t)=nmle'\ ° 1 
S(t) 
0 t<t, 
In(t—t,)— 
nm(T,—Ty)®|—— 9" leat p<t<t, “ws 
In(t—t,)— 
7 nm hy Fy] Gita 
o 
In(t—t))—p 
e Cit t) t>t, 


Ag(t) 


Equation (4) consists of two terms. go(t) captures a career’s collective impact in 
the absence of hot streaks (that is, '(f) =I'9). Contributions from the hot streak 
are encoded in Ag(t), driven by both the timing and magnitude of hot streaks 
(ty, ty, Thy, and Ty = To). 

Evaluating the accuracy of the hot-streak model. We quantify the accuracy of 
our model in equation (4) using three metrics. 

To account for the inherently noisy career trajectories, we first assign an impact 
envelope to each career, explicitly quantifying the uncertainty of model predictions 
(Extended Data Fig. 6g). We simulated g(t) for each individual by assigning a 
Gaussian noise (0, o) to the fitted [9 and I'y. For each paper i we randomly 
draw its ['; from a normal distribution, depending on whether the paper was 
published within the hot streak (I'y during hot streak, I’y otherwise). The standard 
deviation o, represents the inherent noise of the goodness parameter defined in 
Supplementary Information S3.5. For each individual, we simulated g(t) for 1,000 
realizations, allowing us to obtain a distribution of g(t), with one standard devia- 
tion offering an uncertainty envelope. We repeated the same procedures for the 
null model. We measure the fraction of g(t) that falls within the envelope, finding 
that the distribution peaks close to 1 (Extended Data Fig. 6h), which indicates that 
most career trajectories are well encapsulated within the predicted envelopes. 

The superior accuracy of our model is also captured by the mean absolute 
percentage error (MAPE). We compared the distribution of MAPE between the 
data and the predictions of the model (Extended Data Fig. 6f), finding again that 
the hot-streak model outperformed the null model. The improvement was most 
pronounced for an early onset of hot streaks (Extended Data Fig. 6i), which is also 
consistent with our model’s predictions. 

To account for model complexity, we also calculated the Bayesian information 
criterion (BIC) measure, which penalizes the number of parameters in the model. 
Compared with the null model, the hot-streak model has systematically smaller 
BIC (Extended Data Figs 6e), documenting that the hot-streak model better 
captures the collective impact ofa career than the null model. 

Implications of hot streaks for long-term career impact. The analytical frame- 
work presented here not only offers a new theoretical basis for our quantitative 
understanding of dynamical patterns governing individual career impact, but also 
may have implications for comparing and evaluating scientists (Extended Data 
Fig. 6). Indeed, for individuals whose hot streaks are yet to come, ignoring the 
hot streak may lead to underestimation of their impacts (Extended Data Fig. 6a, 
b), especially given the ubiquitous nature of hot streaks (Fig. 2f). On the other 
hand, an early onset of a hot streak leads to a high impact that peaks early but 
may not be sustained unless a second hot streak occurs (Extended Data Fig. 6c). 
Testing alternative hypotheses. To explore the possibility that alternative hypoth- 
eses might explain the observed patterns, we tested several models that capture dif- 
ferent dynamics of hot streaks (Supplementary Information $6.3), each associated 
with possible origins of the uncovered hot streaks. (A) A right trapezoid (Extended 
Data Fig. 7b) captures a sudden onset of a hot streak with a more gradual decline, 
as innovators may stumble upon a groundbreaking idea, which manifests itself in 
the forms of multiple artworks, films, or publications. Hence from an evolutionary 
perspective, the duration of a hot streak may characterize the time it takes for the 
temporary competitive advantage to dissipate. (B) An isosceles trapezoid model 
(Extended Data Fig. 7c) captures hot streaks that evolve and dissolve gradually 
over time, which may approximate social tie dynamics, as one individual’s hot 
streak could be the result of a fruitful, repeated collaboration’”**. (C) Furthermore, 
individual performance may peak at a certain point in a career, prompting us to 
test inverted-U shape (Extended Data Fig. 7d) and tent functions (Extended Data 
Fig. 7e). (D) Last, a left trapezoid function (Extended Data Fig. 7f) captures a 
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gradual startup period with a sharp cutoff, corresponding to career opportunities 
that can augment impact but last for a fixed duration. 

We tested the validity of the four alternative hypotheses (A-D) by comparing 
each model's prediction with empirical observations on the relative order of the 
top six hits within a career. The symmetric patterns of ¢ and R on observed in 


real careers suggest that the biggest hit is equally likely to appear before or after the 
second-biggest hit. The randomness of the relative ordering among hits is not 
limited to the two biggest hits. Indeed, we measured the position of the top three 
hits (N: mover) relative to the top six hits of the career, and compute P(N) for each 
of the three hits for artists, directors and scientists. We found a lack of predictive 
patterns for P(N) across the three domains, suggesting that the relative orders 
among the top six hits in real careers are random (Extended Data Fig. 7g, 0, w). We 
tested hypotheses A-D systematically to describe real careers (Supplementary 
Information S6.3), and found that the hot-streak model was the only model whose 
predictions were consistent with real careers (Extended Data Fig. 7h-m, p-u, x-ac). 
As such, the hot-streak model also offers a superior fit to the data than the other 
models (Extended Data Fig. 7n, v, ad). 


LETTER 


We also tested whether Markov models could account for our observations 
(Supplementary Information $6.2). We explored multiple variants of Markov 
models by introducing short-range correlations between the impacts of adja- 
cent works, correlations between the volatility of their impacts, and hidden 
Markov model with two states, finding again that the hot-streak model stood 
out in its ability to describe the observed patterns (Extended Data Fig. 8 and 
Supplementary Information $6.2). Together, these results demonstrate that none 
of these alternative hypotheses alone can account for the empirical observations 
in real careers. 

Code availability. Code is available at https://lu-liu.github.io/hotstreaks/. 
Data availability. The data are available at https://lu-liu.github.io/hotstreaks/. 
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Extended Data Fig. 1 | Additional results on hot streaks in artistic, 
cultural, and scientific careers. a—c, The cumulative distribution 
P(>N'/N) for the order of the top three highest impact works within a 
career for artists (a), directors (b) and scientists (c). N' denotes the order 
of the i" highest-impact work within a career. The colours denote different 
hit works, and the dashed grey line denotes P(>N'/N) for a uniform 
distribution. d-f, ¢(N**, N***) for the second- and third-highest-impact 
works within a career. @(N**, N***) is also overrepresented along the 


diagonal. g-i, 6(N*, N***) for the first- and third-highest-impact works 
within a career. j-r, We shuffled the order of each work in a career while 
keeping their impact intact. The diagonal patterns in d-i and Fig. la-c 
disappeared for shuffled careers. s-u, ¢(N*, N**) predicted by the hot- 
streak model successfully recovered the diagonal patterns observed in a-c. 
For d-u and Fig. la-c, we applied the same binning procedure to data, 
using bins that ranged from 0 to 1 with increments of 0.1. 
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Extended Data Fig. 2 | Measuring the length of streaks using different d-f, Definitions of the longest streak L for artists (d), directors (e) 
thresholds. a, The distribution of auction price P(Price) for artists. Blue and scientists (f). Dots are coloured orange above the threshold, blue 
dots denote data, and the red line is a log-normal distribution with average otherwise. The lower panel highlights the longest streak in a career. 
jt=7.9 and standard deviation o = 1.5. b, The distribution of film rating g-i, P(L) for real careers and P(L,) for shuffled careers using the mean 
P(Rating) for directors. The red line is a normal distribution with average impact within a career as the threshold. j-l, As in g-i, but using the top 
j4=7.1 and standard deviation o = 1.2. c, The distribution of raw and 10% impact as the threshold to calculate L and Ls. In all cases, P(L) has 
rescaled Cjo (inset) for scientists. The red line is a log-normal distribution, a wider tail than P(L,), indicating that high-impact works in real careers 
with j4= 2.3 and o=1.3 for cand js = —0.4 and 0 = 0.8 for the inset. tend to cluster together. 
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Extended Data Fig. 3 | Varying career length. To test the robustness of 
our results, we repeated our measurements by controlling for the career 
length of individuals. a-i, Artists and directors with careers of at least 

20 years and scientists with careers of at least 30 years. a-c, P(>N'/N) of 
the top three highest-impact works within a career. d-f, R(a2) among the 


top three highest-impact works in a career. g—i, P(L) for real careers and 
P(L,) for shuffled careers. j-r, As in a-i but for artists and directors with 
careers of at least 30 years and scientists with careers of at least 40 years. 
These results demonstrate that the patterns observed in Fig. 1 hold for 
individuals with different career lengths. 
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Extended Data Fig. 4 | Artistic careers from different eras. a, R AN) for real careers and P(L,) for shuffled careers for artists who started their 
artists who started their careers before 1850. b, P(L) for real careers and careers between 1850 and 1900. These results demonstrate that the 
P(L,) for shuffled careers for artists who started their careers before 1850. patterns observed in Fig. 1 hold for artists from different eras. 

GR =) for artists who started careers between 1850 and 1900. d, P(L) for 
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a-c, Correlations between I'y and Il for artists (a; n = 3,166), directors duration of hot streaks over total career lengths. The temporally localized 
(b; n=5,098) and scientists (c; 1 = 18,121). The blue background denotes nature of a hot streak is also captured by its proportion over career 
the kernel density of data, dots represent binning results of data, and length Ty/T. 


the red lines depict a linear fit. Inset, the relationship between 
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Extended Data Fig. 6 | Comparison of g(t) between the null model and 
the hot-streak model. a—c, g(t) of three scientists in our data set with mid- 
career (a), late-career (b) and early-career (c) onset of hot streaks. Red 
dots denote data, the blue line is the null model’s prediction based on early 
performance, and the red line captures the predictions from the hot-streak 
model, with dashed grey lines denoting the start and end of hot streaks. 

d, The difference between our hot-streak model and the null model for 
each individual, Ag(t). Dashed lines with corresponding colours denote 
the start of the hot streak. d illustrates the discrepancies in estimating an 
individual’s future impact if we ignore the uncovered hot streaks. e, The 
distribution of the BIC measure, P(BIC), showing that the hot-streak 
model outperforms the null model in describing g(t) after accounting for 


model complexity. f, The distribution of the MAPE measure, P(MAPE), 
showing that the hot-streak model outperforms the null model in 
describing g(t). g, The uncertainty envelope of g(t) for an individual in 
our data set. Blue dots denote data, and the red line is the fitting result of 
equation (4). Shaded area illustrates predicted uncertainty (one standard 
deviation). h, The fraction of g(t) falling within the envelope for the null 
model (blue) and our hot-streak model (red). Fraction = 1.0 indicates that 
the entire g(t) trajectory falls within the envelope. i, Average MAPE of our 
hot-streak model and the null model for individuals with early-career, 
mid-career and late-career onset of hot streaks. The difference is largest 
for individuals with early-onset hot streaks and smallest for those with 
late-onset ones. 
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Extended Data Fig. 7 | Testing alternative hot-streak dynamics. 
a-f, Illustrative examples of '(N) for the hot-streak model (a), right 
trapezoid function (b), isosceles trapezoid function (c), quadratic function 
(d), tent function (e) and left trapezoid function (f). g, The distribution of 
the relative position P(N) of the three highest-impact works among the six 
highest-impact works within a career for artists, where N denotes the 
relative order among the top six hits. h-m, P(N) predicted by 
corresponding models shown in a-f, respectively, according to artists’ real 
productivity profiles. To test whether data agree with model predictions, 
we measured their statistical difference using the P value of the 
Kolmogorov-Smirnov test for discrete distributions. We colour the 


a 8 8 
re er e158 
er Ov 

15088 


c € 3 a € 3 
aie xen ged zen 16! 
wat 00 
a vat 


distributions green if we cannot reject the hypothesis that the data and the 
model predictions come from the same distributions, and red otherwise. 
Among the six models, the hot-streak model is the only model whose 
predictions are consistent with the data in terms of the relative ordering 
among the six highest-impact works observed in real careers. n, The 
proportion of real careers that are captured by the model with the smallest 
BIC among different hypotheses. The hot-streak model again stands out as 
the best model to describe real careers. We repeated the analyses for 
directors (p-v) and scientists (x-ad), the conclusions remained the same 
across all three domains. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


-5 
10-45-10 -5 0 5 10 15 


—-e- 


LETTER 


Da 
-e- Data shuffle 


10.40 -5 0 5 10 


Alog(price) Arating Alog(Cio) 

d 10 @ 10 f 10 
S S S 
5 05 = 05 5 0.5 
& & & 
o © o 
= 0.0 0.0 Ptecccescwe-cessee-rew £ 0.0 / teccceccwscescewree 
S S S 
1S) 1S) \S) 
aS} 2 £ os 
© -0.5 © -0.5 = -0. 
z= = <= 

-1.0 -1.0 -1.0 

0 10 20 0 10 20 0 10 20 
lag lag lag 
g 10- h i0- I i0- 
2.0 2.0 2.0 
0.8 0.8 0.8 
1.6 1.6 1 

= = = 
= 0.6 1.2 = 0.6 1.2 = 0.6 1.2 
* * * 
=] 04 0.8 = 0.4 0.8 = 0.4 0.8 

0.2 0.4 0.2 ' 0.4 0.2 0.4 

0.0.4 ‘Boo 0.0.4 ‘Boo 0.0.4 ‘Boo 

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 06 0.8 1.0 
N*/N N*/N N*/N 


=e Model 
f/%,-* Model Shuffle 


I 
é 
I 
I 
I 
I 
I 
I 
I 
I 


Extended Data Fig. 8 | Testing Markovian hypotheses. Here we test 
whether the observed patterns can be explained by Markovian dynamics 
that introduce correlations between neighbouring data points. We first 
test the assumptions of the Markovian hypothesis from the data (a-f). 
a-c, The distribution of N, N+ 1 differences between adjacent data points 
observed in real careers for artists (a, n = 3,480), directors (b, n = 6,233) 
and scientists (c, n = 20,040). d-f, The autocorrelation measured in real 
careers for artists (d, n = 3,480), directors (e, n = 6,233), and scientists 

(f, n = 20,040). a-f suggest that there is little short-range correlation 
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in data across the three domains. We test three variants of Markovian 
models (g-1). The details of these models are outlined in Supplementary 
Information S6.2. g-i, ¢(N*, N**) of the top two highest-impact works 
within a career for three Markovian models using scientists’ profiles as 
input. j-l, The distribution of the longest streak length P(L) and P(L,) 
using median impact within a career as threshold for the three Markovian 
models. g-I demonstrate that the three Markovian models failed to capture 
the observed colocations among hits. 
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denotes the fitting result on [ sequence for a randomly selected career for hot-streak model for each individual. 
artists (a), directors (b) and scientists (c). Blue dots denote the moving 
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Extended Data Fig. 10 | Individuals with one or more hot streaks. 

a-c, The distribution of average impacts for individuals with one or 

more than one hot streaks for artists (a), directors (b) and scientists (c). 
Blue dots denote individuals with one hot streak, and red dots denote 
individuals with at least two hot streaks. d—f, The distribution of the 
number of works P(N) within a career for individuals with one or more 
than one hot streak for for artists (d), directors (e) and scientists (f). 

g-i, The distribution of career length P(r) for individuals with one or more 
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than one hot streaks for artists (g), directors (h) and scientists (i). j-1, The 
distribution of P(I'y) for individuals with one or more than one hot streaks 
for artists (j), directors (k) and scientists (1). Between those who have one 
or two hot streaks, there is no detectable difference in terms of typical 
performance metrics, including impact, productivity and career length, 
suggesting that the hot streak captures an orthogonal dimension to current 
metrics characterizing individual careers. 
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The incidence of acute myeloid leukaemia (AML) increases with age 
and mortality exceeds 90% when diagnosed after age 65. Most cases 
arise without any detectable early symptoms and patients usually 
present with the acute complications of bone marrow failure’. 
The onset of such de novo AML cases is typically preceded by the 
accumulation of somatic mutations in preleukaemic haematopoietic 
stem and progenitor cells (HSPCs) that undergo clonal expansion”. 
However, recurrent AML mutations also accumulate in HSPCs 
during ageing of healthy individuals who do not develop AML, 
a phenomenon referred to as age-related clonal haematopoiesis 
(ARCH)*®. Here we use deep sequencing to analyse genes that are 
recurrently mutated in AML to distinguish between individuals 
who have a high risk of developing AML and those with benign 
ARCH. We analysed peripheral blood cells from 95 individuals 
that were obtained on average 6.3 years before AML diagnosis 
(pre-AML group), together with 414 unselected age- and gender- 
matched individuals (control group). Pre-AML cases were distinct 
from controls and had more mutations per sample, higher variant 
allele frequencies, indicating greater clonal expansion, and showed 
enrichment of mutations in specific genes. Genetic parameters were 
used to derive a model that accurately predicted AML-free survival; 
this model was validated in an independent cohort of 29 pre-AML 


cases and 262 controls. Because AML is rare, we also developed 
an AML predictive model using a large electronic health record 
database that identified individuals at greater risk. Collectively our 
findings provide proof-of-concept that it is possible to discriminate 
ARCH from pre-AML many years before malignant transformation. 
This could in future enable earlier detection and monitoring, and 
may help to inform intervention. 

To examine the occurrence of somatic mutations before the develop- 
ment of AML, we carried out deep error-corrected targeted sequencing 
of AML-associated genes in a discovery cohort of 95 pre-AML cases 
and 414 age- and gender-matched controls (Supplementary Table 1). 
A validation cohort comprising 29 pre-AML cases and 262 controls 
(Supplementary Table 1) was analysed using deep sequencing with 
an overlapping gene panel. Taking both cohorts together, ARCH, 
defined on the basis of putative driver mutations (ARCH-PD), was 
found in 73.4% of the pre-AML cases at a median of 7.6 years before 
diagnosis. By contrast, ARCH-PD was observed in 36.7% of controls 
(P < 2.2 x 107!°, two-sided Fisher’s exact test; Fig. 1a), consistent with 
data from a study of more than 2,000 unselected individuals assayed 
using a similarly sensitive method*"°. Additionally, 39% of pre-AML 
cases above the age of 50 had a driver mutation with a variant allele 
frequency (VAF) of more than 10%, compared to only 4% of controls, 
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Fig. 1 | Prevalence of ARCH, number of mutations and clone size in 
individuals who developed AML. a, Prevalence of ARCH-PD among 
pre-AML cases (red) and controls (blue). b, The number of ARCH-PD 
mutations detected in cases and controls according to age. Box plot 
centres, hinges and whiskers represent the median, first and third quartiles 
and 1.5x interquartile range, respectively. Individual values are indicated 
as dots. c, VAF of ARCH-PD mutations. *P < 0.0005, two-sided Wilcoxon 
rank-sum test with Bonferroni multiple testing correction. All panels show 
data for n = 800 biologically independent samples. 


a prevalence that is in line with the largest studies of ARCH in the 
general population* (P < 2.2 x 107°, two-sided Fisher’s exact test; 
Extended Data Fig. 1). 

The median number of ARCH-PD mutations per individual 
increased with age and was significantly higher in the pre- AML group 
relative to controls (Fig. 1b and Supplementary Table 2). Furthermore, 
examination of ARCH-PD VAF distribution revealed significantly 
larger clones among the pre-AML cases (P = 1.2 x 10713, two- 
sided Wilcoxon rank-sum test; Fig. 1c). To gain insight into clonal 
growth dynamics, we examined serially collected samples that 
were available for a subset of the validation cohort. We did not find 
significant differences in clonal expansion rates between pre-AML 
cases and controls (Extended Data Fig. 2a, b), although this may 
in part reflect the shorter follow-up of pre-AML cases, small sam- 
ple size and large variance in growth rates (Extended Data Fig. 2c). 
The observed differences between pre-AML cases and controls 
may arise through cell-intrinsic or -extrinsic factors. Although 
these variables have not been adequately studied in ARCH, a 
number of observations in different contexts, such as aplasia, 
advanced age and after chemotherapy, have shown that increased 
clonal fitness is associated with distinct mutations depending on 
context!*"!*, Notably, mutations in splicing factor genes were signif- 
icantly enriched among the pre-AML cases relative to the controls 
(odds ratio, 17.5; 95% confidence interval, 8.1-40.4; P= 5.2 x 107!°, 
two-sided Fisher’s exact test) and were present in significantly younger 
individuals (median age 60.3 compared to 77.3 years, P= 1.7 x 10-4, 
two-sided Wilcoxon rank-sum test; Fig. 2a). Previous work suggests 
that spliceosome mutations appear to confer a competitive advantage 
in the context of ageing'®. Therefore, it is possible that the signifi- 
cantly higher prevalence of such clones in younger pre-AML cases 
may reflect extrinsic selection pressures rather than earlier mutation 
acquisition. 
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Fig. 2 | Accumulation of specific recurrent AML mutations in healthy 
individuals at a young age is associated with progression to AML. 

a, Relative frequency of mutations in the indicated genes according to age 
group for pre-AML cases and controls. b, Proportion of pre-AML cases 
(red) and controls (blue) who had ARCH-PD mutations in recurrently 
mutated genes. *P < 0.05, Fisher’s exact test with Bonferroni multiple 
testing correction. c, The cumulative frequency of recurrent AML 
mutations (reported in >5 specimens in COSMIC) in pre-AML cases and 
controls. ARCH-PD mutations are ranked from left to right along the 

x axis from low to high recurrence. d, VAF of recurrent mutations in pre- 
AML cases and controls. Low, intermediate and highly recurrent COSMIC 
mutations are defined as those reported in 5-19 samples, 20-300 samples 
and >300 samples, respectively. Box plots indicate median, first and third 
quartiles and 1.5x interquartile range. *P < 0.05, two-sided Wilcoxon 
rank-sum test with Bonferroni multiple testing correction. All panels show 
data for n = 800 unique individuals. 


In line with previous reports™®, we found that DNMT3A and TET2 
were the most commonly mutated genes in both groups (Fig. 2b). 
We could not identify any canonical NPM1 mutations nor any FLT3- 
internal tandem duplication mutations, consistent with these arising 
late in leukaemogenesis'®!?, Recurrent CEBPA mutations, which are 
implicated in around 10% of de novo AML", were also absent, sug- 
gesting that driver events in this gene may also be late events in AML 
evolution. In order to quantify the effect of different mutations on 
the likelihood of progression to AML, we ranked ARCH-PD muta- 
tions based on the number of times that they have been reported 
in Catalogue of Somatic Mutations in Cancer (COSMIC) database 
among individuals with haematological malignancies'*. We found that 
mutations that are highly recurrent in cancer specimens were more 
common in pre-AML cases than in controls with ARCH-PD, whereas 
driver events in the controls tended to affect loci that are less 
frequently mutated in haematological malignancies and occurred at 
significantly lower VAF (Fig. 2c, d). Overall, these findings demon- 
strate notable differences in the mutational landscape of ARCH and 
pre-AML. Moreover, this work, in conjunction with recent insights 
into the origins of AML relapse’®, suggests that AML progression 
typically occurs over many years through clonal evolution of pre- 
leukaemic HSPCs before acquisition of late mutations leads to overt 
malignant transformation. 
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Fig. 3 | Model of future risk of AML. a, Forest plot of the risk of AML. 
Purple, orange and green circles indicate hazard ratios for the discovery 
(DC), validation (VC) and combined cohort, respectively. The horizontal 
lines denote 95% confidence intervals for the combined cohort. For 

each gene, the indicated hazard ratio applies to the 10-year risk of AML 
development conferred by each 5% increase in mutation VAF. The green 
vertical line indicates the mean hazard ratio across all genes. The hazard 
ratio for RUNX1 must be interpreted with caution owing to the relatively 
high prevalence of deleterious germline variants in this gene, which may 
not be readily distinguishable from somatic mutations in unmatched 


On the basis of these findings, we next developed an approach to 
quantify the relative contributions of driver mutations and clone sizes 
to the risk of progressing to AML. We tested different regularised 
logistic and Cox proportional hazards regression approaches, which 
achieved similar performance in both the discovery cohort (concord- 
ance (C) = 0.77 + 0.03) and the validation cohort (C = 0.84 + 0.05; 
Extended Data Figs. 3, 4 and Supplementary Table 3). Models that were 
only trained on data from the discovery or validation cohort had sim- 
ilar coefficients (Fig. 3a). We therefore combined the datasets for a 
more accurate analysis of the contributions of mutations in individual 
genes to risk (C = 0.77 + 0.05; area under curve, 0.79; Supplementary 
Table 3). Quantitatively, we found that driver mutations in most genes 
conferred an approximately twofold increased risk of developing AML 
per 5% increase in clone size (Fig. 3a and Supplementary Table 3). 
Notable exceptions to this trend are the most frequently mutated ARCH 
genes, DNMT3A and TET2, which confer a lower risk of progression to 
AML (Fig. 3a, b and Supplementary Table 3). By contrast, a larger effect 
size was apparent for TP53 (hazard ratio, 12.5; 95% confidence inter- 
val, 5.0-160.5) and U2AFI (hazard ratio, 7.9; 95% confidence interval, 
4.1-192.2) mutations (Fig. 3a, b). However, we note that other ARCH-PD 
genes, such as SRSF2, can contribute a similar relative risk owing to 
their presence at a higher VAF in pre-AML cases (Fig. 3a, Extended 
Data Fig. 5a and Supplementary Note). Of note, mutations in TP53 and 
spliceosome genes (including U2AF1) are also associated with a poorer 
prognosis in AML". Because the effect of each ARCH-PD mutation 
is deleterious and the effect of multiple mutations that are present in 
the same individual is multiplicative, a higher number of mutations is 
predicted to increase the risk of progression to AML (Fig. 3c). Similarly, 
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sequencing assays (see Methods). The proportion of individuals with 
mutations in each gene and the average VAF are indicated to the right of 
the forest plot; red and blue circles represent pre-AML cases and controls, 
respectively, with circle sizes scaled to reflect mutation frequency and 
VAE. b-d, Kaplan-Meier curves of AML-free survival, defined as the time 
between sample collection and AML diagnosis, death or last follow-up. 
Survival curves are stratified according to mutation status for selected 
genes (b), number of driver mutations per individual and largest clone 
detected (c) and RDW (d). Data for n = 796 unique individuals (a-c); 

n = 299 individuals for whom RDW measurements were available (d). 


the size of the largest driver clone was also strongly associated with the 
risk of progression to AML, in agreement with the risk of individual 
mutations generally being proportional to VAF (Fig. 3c). Collectively, 
although the VAF and the number of mutations confer much of the 
predictive value, this model does demonstrate distinct gene-level risk 
factors, and is able to quantify the cumulative impact of multiple muta- 
tions and clonal size on the likelihood of progression to AML. 
Although our predictive model performs well in identifying those 
at risk of developing AML in our experimental cohorts, AML inci- 
dence rates in the general population are low (4:100,000)1, and thus 
millions of individuals would need to be screened to identify the 
few pre-AML cases, with many false positives. We therefore sought 
to determine whether routinely available clinical information could 
improve prediction accuracy or identify a high-risk population for 
targeted genetic screening. We first analysed complete blood count 
and biochemistry data that were available for 37 of the pre-AML cases 
and 262 controls. As reported previously*!!’, ARCH-PD was over- 
whelmingly associated with normal blood counts and this was also 
the case for pre-AML cases, indicating that these did not represent 
undiagnosed myelodysplastic syndrome’. We identified a significant 
association between higher red blood cell distribution width (RDW) 
and risk of progression to AML (P = 0.0016, Wald test with Bonferroni 
multiple-testing correction, Fig. 3d). Although traditionally used in the 
evaluation of anaemia, raised RDW has been correlated with inflam- 
mation, ineffective erythropoiesis, cardiovascular disease and adverse 
outcomes in several inflammatory and malignant conditions’. The 
correlation between RDW and risk of AML development remained 
highly significant when controls without ARCH-PD were excluded 
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Fig. 4 | Increased risk of AML development inferred from electronic 
health records. a, Box plot of normalized laboratory measurements. 
Increased RDW, reduction in monocyte, platelet, red blood cell (RBC) and 
white blood cell (WBC) counts (top) show a high association (bottom) 
with a higher risk of AML development and differed at least a year before 


from the analysis (P = 3.5 x 10~°, Wald test with Bonferroni multiple 
testing correction; Extended Data Fig. 5b). Higher RDW has previously 
been associated with ARCH and overall mortality°, but has never been 
shown to distinguish ARCH from pre-leukaemia. In order to verify 
RDW as a predictive factor and determine whether additional clinical 
parameters are associated with risk of AML development, we studied 
the Clalit database”’, which contains electronic health records that 
include an average of 3.45 million individuals per year and data that 
were collected over a 15-year period*!. We identified 875 cases with 
AML using stringent criteria based on diagnostic codes and treatment 
records (Extended Data Fig. 6 and Supplementary Table 4). Analysis 
of RDW trends revealed significantly raised measurements several 
years before AML diagnosis relative to age and sex-matched controls 
(Fig. 4a). Additional parameters that correlated with risk of AML 
development included reductions in monocyte, platelet, red blood 
cell and white blood cell counts, albeit usually remaining above the 
thresholds for clinically relevant cytopenias'® (Fig. 4a and Extended 
Data Fig. 7). These findings suggest that evolving de novo AML may 
sometimes have a considerable prodrome with subtle but discernible 
clinical manifestations. We next applied a machine-learning approach 
to construct an AML prediction model based entirely on variables that 
are routinely documented in electronic health records (Extended Data 
Fig. 8 and Supplementary Table 4). This model was able to predict AML 
6-12 months before diagnosis with a sensitivity of 25.7% and overall 
specificity of 98.2%. The model performed consistently across different 
age groups with an increased relative risk of 28 and 24 for males and 
females, respectively, between the age of 60 and 70 years (Fig. 4b). To 
better understand which patients are most likely to be accurately clas- 
sified by this model, we compared absolute laboratory values for true 
positives and false negatives. We found that 35.5% of false-negative 
predictions were for patients for whom infrequent blood count data 
were available (Extended Data Fig. 9). Some of the true-positive cases 


AML diagnosis. b, Model performance stratification by age and gender. 
Age ranges are indicated above each graph. c, Absolute laboratory values 
for true positive (TP) and false negative (FN) predictions. Box plots 
indicate median, first and third quartiles and 1.5 x interquartile range. 


had mildly abnormal blood counts that would not initiate a diagnostic 
work-up (Fig. 4c), and cytopenias that would be compatible with undi- 
agnosed myelodysplastic syndrome!® were uncommon. 

Collectively, our findings provide new insights into the pre-clinical 
evolution of AML and support the hypothesis that individuals at high 
risk of AML development can be identified years before they develop 
overt disease. To this end, we present two distinct models for the pre- 
diction of de novo AML: one based on somatic point mutations and 
the other on routinely documented clinical information. We find that 
basic clinical and laboratory data can identify a high-risk subgroup 
6-12 months before AML presentation, while genetic information can 
identify a substantial fraction of cases several years to more than a 
decade before diagnosis. By characterizing features that distinguish 
benign ARCH from pre-leukaemia, our models give valuable insights 
into leukaemogenesis. It is evident from the current study, together with 
our recent analysis of mutation acquisition from pre-leukaemic devel- 
opment through to relapse!®, that long-term pre-leukaemic HSPCs fre- 
quently carry mutations and undergo considerable clonal expansion 
while retaining differentiation capacity for years before AML diagnosis. 
Furthermore, it is clear that some mutations, particularly those affect- 
ing TP53 and U2AF1, impart a relatively high risk of subsequent AML, 
whereas mutations in other genes, for example DNMT3A and TET2, 
confer a lesser risk of malignant transformation. Previous studies sug- 
gest that oncogenic mutations in TP53 and spliceosome genes confer 
little or no competitive advantage in the absence of particular selective 
pressures’!*, indicating that cell-extrinsic factors may be important 
determinants of clonal trajectory. 

Cancer predictive models have enabled successful early detection 
and intervention programmes for several solid tumours?>-*°. However, 
screening tests are unavailable for the sub-clinical stages of most 
haematological malignancies. Our study provides proof-of-concept 
for the feasibility of early detection of healthy individuals at high risk 
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of developing AML, and is a first step in the design of future clinical 
studies to investigate the potential benefits of early interventions in 
this deadly disease. However, the infrequency of AML necessitates 
that future screening tests provide high sensitivity and specificity. Our 
findings suggest that basic clinical data may identify a higher risk pop- 
ulation that might benefit from targeted genetic screening. Equally, 
combining clinical and genetic information in a single model and 
including structural driver events is likely to improve model accuracy 
further. Nevertheless, establishing the utility of such a tandem approach 
will require extensive clinical and genetic analysis on the same popula- 
tion cohort, in a prospective setting. Furthermore, ARCH is associated 
with several non-malignant conditions*’, and may have a causal role in 
cardiovascular disease*®”’. Therefore, genetic testing for ARCH may 
also prove useful in the management of common age-related diseases. 
Moreover, this study has broader implications for cancer screening and 
early intervention beyond AML. Advances in sequencing technologies 
have revealed a remarkable degree of somatic genetic diversity in nor- 
mal ageing tissues, often characterized by the presence of clones that 
have canonical oncogenic mutations”®. The degree to which clones at 
high risk of malignant transformation can be reliably distinguished 
from their indolent counterparts is an important biological question 
with compelling clinical ramifications. Understanding the selective 
pressures and cell-intrinsic mechanisms governing clonal fate is the 
next important step in developing strategies to predict and prevent 
progression to overt malignancy. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators were not blinded to 
allocation during experiments and outcome assessment. 

Study participants. Samples for both the discovery and validation cohort were 
obtained from participants in the EPIC study”. All relevant ethical regulations 
were followed. Written informed consent was obtained from all participants in 
accordance with the Declaration of Helsinki and protocols were approved by 
the relevant ethics committees (IARC Ethics Committee approval #14-31, the 
Weizmann Institute of Science Ethics board approval #60-1 and East of England- 
Cambridgeshire and Hertfordshire Research Ethics Committee reference num- 
ber 98CNO1). Patients with AML were identified based on the following ICD9 
codes: 9861/3, 9860/3, 9801/3, 9866/3, 9891/3, 9867/3, 9874/3, 9840/3, 9872/3, 
9895/3, 9873/3, which included only cases of de novo AML, and no secondary 
AML. All patients provided peripheral blood samples for which the buffy coat 
fractions were separated and aliquoted for long-term storage in liquid nitrogen 
before DNA extraction. 

Discovery cohort. In total, 509 DNA samples were collected from individuals 
upon enrolment into the EPIC study between 1993 and 1998 across 17 different 
centres” (Supplementary Table 1). Altogether, 95 individuals who developed AML 
an average of 6.3 years (interquartile range (IQR) = 4.8 years) after the sample was 
collected were included in the pre-AML group. For the control group, 414 age- and 
gender-matched individuals were selected, as they did not develop any haematolog- 
ical disorders during the average follow-up period of 11.6 years (IQR = 2.1 years). 
The median age at recruitment was 56.7 years (range, 36.08-74.42). In order to 
minimize any possible demographic biases, an approximate 1:4.5 pre-AML to con- 
trol ratio was maintained across the different centres. 

Validation cohort. Samples were obtained from individuals enrolled in the EPIC- 
Norfolk longitudinal cohort study between 1994 and 2010. Samples and clinical 
metadata were available from 37 patients with AML (of which 8 were already 
included in the discovery cohort) and 262 age- and gender-matched controls with- 
out a history of cancer or any haematological conditions. The average time between 
the first blood sampling and AML diagnosis was 10.5 years (IQR = 8.3 years). The 
average follow-up period for the control cohort was 17.5 years (IQR = 3.8). For 
12 individuals in the pre-AML cohort, 2-3 blood specimens were available, taken 
a median of 3.4 years apart. Of the 262 controls, 141 had multiple blood samples 
available, spanning a median of 10.5 years. Blood counts and other clinical param- 
eters were available for all study participants (Supplementary Table 1). 

Targeted sequencing. Discovery cohort sequencing. Targeted deep sequencing was 
performed using error-corrected sequencing as follows. 

Shearing of genomic DNA, preparation of pre-capture sequencing libraries, 
hybridization-based enrichment, assessment of the libraries quality and enrich- 
ment following hybridization were performed as previously described”. In brief, 
100 ng of genomic DNA was sheared before library construction (KAPA Hyper 
Prep Kit KK8504, Kapa Biosystems) with a Covaris E220 instrument using the 
recommended settings for 250-bp fragments. Following end repair and A-tailing, 
adaptor ligation was performed using 100-fold molar excess of Molecular Index 
Adaptor. Library clean-up was performed with Agencourt AMPure XP beads 
(Beckman-Coulter) and the ligated fragments were then amplified for eight cycles 
using 0.5 1M Illumina universal and indexing primers. 

Targeted capture was carried out on pools containing three indexed libraries. 
Each pool of adaptor-ligated DNA was combined with 5 ul of | mg ml! Cot-I 
DNA (Invitrogen), and 1 nmol each of xGEN Universal Blocking Oligo, TS-p5, 
and xGen Universal Blocking Oligo, TS-p7 (8 nucleotides). The mixture was 
dried using a SpeedVac and then re-suspended in 1.1 iil water, 8.5 jul NimbleGen 
2x hybridization buffer and 3.4 jl NimbleGen hybridization component A. The 
mixture was heat denatured at 95°C for 10 min before addition of 4 1l of xGen 
Lockdown Probes (xGen AML Cancer Panel v.1.0, 3 pmol). Each pool was then 
hybridized at 47°C for 72 h. Washing and recovery of the captured DNA was 
performed according to the manufacturer’s specifications. In brief, 100 1l of clean 
streptavidin beads was added to each capture. Following separation and removal 
of the supernatant using a magnet, 200 11 1 x Stringent Wash Buffer was added and 
the reaction was incubated at 65°C for 5 min. The supernatant containing unbound 
DNA was removed before repeating the high stringency wash one additional time. 
Then, the bound DNA was washed as follows: (1) 200 pl 1 Wash Buffer I and 
separation of the supernatants by magnetic separation; (2) 200 jl 1x Wash Buffer 
II after magnetic separation; (3) 200 jl 1x Wash Buffer III and removal of the 
supernatants using magnetic separation. The captured DNA on beads was resus- 
pended in 40 11 of Nuclease-Free water before dividing the total volume into two 
PCR tubes and subjecting the libraries to 10 cycles of post-capture amplification 
(manufacturer-recommended conditions; Kapa Biosystems). Before sequencing, 
libraries were spiked with 2% PhiX. 

Validation cohort sequencing. Targeted sequencing was performed using a 
custom complementary RNA bait set (SureSelect, Agilent, ELID 0537771) designed 
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complementary to all coding exons of 111 genes that have been implicated in mye- 
loid leukaemogenesis (Extended Data Table 1). Genomic DNA was extracted from 
peripheral whole blood and sheared using the Covaris M220. Equimolar pools of 
10 libraries were prepared and sequenced on the Illumina HiSeq 2000 using 75-bp 
paired-end sequencing as per Illumina and Agilent SureSelect protocols. 
Variant calling. Discovery cohort variant calling and error correction. The 126-bp 
paired-end reads sequencing data from the Illumina platform were converted 
to FASTQ format, the 2-bp molecular barcode information at each read of the 
pair was trimmed and was written in the reads’ name. The thymine nucleotide 
required for ligation was removed from the sequences. Burrows—Wheeler aligner 
(BWA-mem)*! was used for alignment of the processed FASTQ files to the refer- 
ence hg19 genome, after realignment of insertions and deletions (indels) using 
GATK™. An in-house algorithm was written to collapse read families that share 
the same molecular barcode sequence, the left-most genomic position of where 
each read of the pair maps to the reference and the CIGAR string. Families that 
consisted of at least two reads were used to generate consensus reads and a con- 
sensus base was called when there was at least 70% agreement. When a consensus 
base was called, it was assigned with the maximum base quality score observed in 
its corresponding pre-collapsed reads. Furthermore, when possible, duplex reads* 
were generated from two consensus reads, from a singleton read and a consensus 
read, or from two singleton reads. For each sequenced sample, we generated two 
BAM files, called BAM1 and BAM2. BAM1 consisted of duplex reads, consensus 
reads and singleton reads, thereby including some error-corrected and non-error 
corrected reads, while still containing all the genomic information encoded in the 
data in the form of unique DNA molecules. BAM2 consisted of duplex reads and 
consensus reads but not singleton reads. Both files were then analysed to detect 
single nucleotide variants (SNVs) and small indels using Varscan2™. To further 
remove sequencing artefacts and improve sensitivity, we applied a two-step pol- 
ishing statistical approach that models the error rate for each sequenced genomic 
position. For both steps, BAM1 was used and all samples except the sample that 
was investigated were included for error rate modelling. At step one, as previously 
described*”, the error rates were modelled by fitting Weibull distribution curves to 
the non-reference allele fractions. SNVs with allele fractions that were statistically 
distinguishable from the background error rates (P = 0) were further analysed. 
At step 2, the coverage of the non-reference allele fractions was considered using 
linear line fitting that describes the negative correlation that exist between the 
log(non-reference allele fraction) and the corresponding log(coverage) values. This 
allowed us to estimate different error rates at different coverage depths. Because 
indel errors are rare and cannot be appropriately modelled by the same statistical 
framework, they were called using barcode-mediated error correction alone. At 
least 10 consensus reads, 5 supporting reads on the forward strand, 5 supporting 
reads on the reverse strand and 2 duplex reads were required to call an indel. 
Additional post-processing steps applied to data from both the discovery cohort 
and validation cohort are detailed in ‘Additional post-processing filters applied to 
discovery and validation cohort data. Variants were annotated using Annovar®. 
Validation cohort variant calling. Sequencing reads were aligned to the refer- 
ence genome (GRCh37d5) using the Burrows—Wheeler aligner (BWA-aln)*!. 
Unmapped reads, PCR duplicates and reads mapping to regions outside the target 
regions (merged exonic regions and 10 bp either side of each exon) were excluded 
from analysis. Sequencing depth at each base was assessed using Bedtools coverage 
v.2.24.0°°. 

Somatic SNVs were called using shearwater, an algorithm developed for 
detecting subclonal mutations in deep-sequencing experiments (https://github. 
com/gerstung-lab/deepSNV v.1.21.5)*”-*° considering only reads with minimum 
nucleotide and mapping quality of 25 and 40, respectively. This algorithm models 
the error rate at individual loci using information from multiple unrelated sam- 
ples. Additionally, allele counts at the recurrent AML mutation hotspots listed in 
‘Curation of oncogenic variants’ were generated using an in-house script (https:// 
github.com/cancerit/alleleCount) and manually inspected in the Jbrowse genome 
browser“. To further complement our SNV calling approach, we applied an exten- 
sively validated in-house version of CaVEMan v.1.11.2 (Cancer variants through 
expectation maximization)*!. CaVEMan compares sequencing reads between 
study and nominated normal samples and uses a naive Bayesian model and 
expectation-maximization approach to calculate the probability of a somatic 
variant at each base (https://github.com/cancerit/CaVEMan). 

Post-processing filters required that the following criteria were met for 
CaVEMan to call a somatic substitution. (1) If coverage of the mutant allele was 
less than 8, at least one mutant allele was detected in the first two-thirds of the 
read. (2) Less than 3% of the mutant alleles with base quality >15 were found in 
the nominated normal sample. (3) Mean mapping quality of the mutant allele reads 
was >21. (4) The mutation does not fall in a simple repeat or centromeric region. 
(5) Fewer than 10% of the reads covering the position contained an indel according 
to mapping. (6) Less than 80% of the reads report the mutant allele at the same 
read position. (7) At least a third of the reads calling the variant had a base quality 
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of 25 or higher. (8) Not all mutant alleles reported in the second half of the read. 
(9) Position does not fall within a germline insertion or deletion. 

The following additional post-processing criteria were applied to all SNV calls. 
(1) Minimum VAF = 0.5% with a minimum of five bidirectional calls reporting 
the mutant allele (with at least two reads in forward and reverse directions). (2) No 
indel called within a read length (75 bp) of the putative substitution. 

Small indels were sought using two complementary bioinformatics approaches. 
First, an in-house version of Pindel v.2.2” (https://github.com/cancerit/cgpPindel) 
was applied. We additionally used the aforementioned deepSNV algorithm in order 
to increase sensitivity for indels present at low VAF. VAF correction was performed 
using an in-house script (https://github.com/cancerit/vafCorrect). 

Post-processing filters required that the following criteria were met for a variant 
to be called. (1) A minimum of five reads supporting the variant with a minimum 
of two reads in each direction. For Pindel, the total read count was based on the 
union of the BWA and Pindel reads reporting the mutant allele. (2) VAF > 0.5%. 
(3) Variant not present within an unmatched normal panel of approximately 400 
samples. (4) No reads supporting the variant identified in the nominated normal 
sample. 

Mutations were annotated according to ENSEMBL v.58 using VAGrENT® 
for transcript and protein effects (https://github.com/cancerit/VAGrENT) and 
Annovar*® for additional functional annotation. 

Additional post-processing filters applied to discovery and validation cohort data. The 
following variants were flagged for additional inspection for potential artefacts, 
germline contamination or index-jumping event. (1) Any mutant allele reported 
within 75 bp of another variant. (2) Any mutant allele with a population allele 
frequency >1 in 1,000 according to any of five large polymorphism databases 
(ExAC, 1000 Genomes Project, ESP6500, CG46 and Kaviar) that is not a canon- 
ical hotspot driver mutation with COSMIC recurrence >100. (3) Mutations that 
were present in >10% of the control cohort but not recurrent in COSMIC were 
flagged as potential germline variants or sequencing artefacts. (4) As artefactual 
indels tend to be recurrent, any indels occurring in >2 samples were flagged as 
for additional inspection. 

Curation of oncogenic variants. Putative oncogenic variants were identified 
according to evidence for functional relevance in AML as previously described 
and used to define ARCH-PD". 

Variants were annotated as likely driver events if they fulfilled any of the follow- 
ing criteria. (1) Truncating mutations (nonsense, essential splice site or frameshift 
indel) in the following genes implicated in AML pathogenesis by loss-of-function: 
NF1, DNMT3<A, TET2, IKZF1, RAD21, WT1, KMT2D, SH2B3, TP53, CEBPA, 
ASXL1, RUNX1, BCOR, KDM6A, STAG2, PHF6 and KMT2C. (2) Truncating vari- 
ants in CALR exon 9. (3) JAK2"!”", (4) FLT3 internal tandem duplication. (5) Non- 
synonymous variants at the following hotspot residues: CBL E366, L380, C384, 
C404, R420 and C396; DNMT3A R882; FLT3 D835; IDH1 R132; IDH2 R172 and 
R140; KIT W557, V559 and D816; KRAS A146, Q61, G13 and G12; MPL W515; 
NRAS Q61, G12 and G13; SF3B1 K700 and K666; SRSF2 P95; U2AF1 Q157, R156 
and $34. (6) Non-synonymous variants reported at least 10 times in COSMIC 
with VAF <42% and population allele frequency <0.003. (7) Non-synonymous 
variants clustering within a functionally validated locus or within four amino acids 
of a hotspot variant with population allele frequency <0.003 and VAF <42%. (8) 
Non-synonymous variants reported in COSMIC > 100 times with population allele 
frequency <0.003 regardless of VAF. 

Our driver curation strategy inevitably runs a small risk of including germline 
variants in familial AML genes. We feel that in the real world, where a matched 
constitutional DNA sample would be unavailable, this is the best approach. 
Statistical analysis. All statistical analyses were performed in the R statistical 
programming environment. A two-sided Wilcoxon rank-sum test was used to 
assign significance level for differences in the median number of somatic mutations 
among the pre-AML and control groups, the median VAF of mutations among 
groups. and the age of individuals with spliceosome mutations. Fisher’s exact test 
was used to assess the significance of differences in the prevalence of ARCH among 
the groups and spliceosome mutations in the pre-AML group. 

Predictive modelling. Cox proportional hazards model with random effects. We 
used a Cox proportional hazards regression to model AML progression-free sur- 
vival as previously described'*"*. We used random effects for the Cox proportional 
hazards model in the CoxHD R package (http://github.com/gerstung-lab/CoxHD). 
A key strength of this approach is the ability to include many variables in one 
model while shrinking estimated effects for parameters with weak support in the 
data, thus controlling for overfitting. We used weighting to minimize the biases 
introduced by the artificial case-control ratio‘ and calculated hazard ratios 
relative to the (approximate) true cumulative incidence of about 1-3/1,000 in the 
given age range over a follow up of 10-20 years. The observed driver mutation 
frequency and VAF in pre-AML cases closely resembled values expected based 
on the estimated risks, indicating that risk model and driver prevalence are well 
aligned (Extended Data Fig. 4). Full details of model derivation and comparisons 


with alternative methods are included in the accompanying code (Supplementary 
Note, also available at https://github.com/gerstung-lab/preAML). In brief, variables 
comprised age, gender and the VAF of putative driver mutations (see ‘Curation of 
oncogenic variants’ for details of variant curation). We performed agnostic impu- 
tation of missing variables by mean and linear rescaling of gene variables by a 
power of 10 to a magnitude of 1. The model was first trained separately on the 
discovery cohort and validation cohort. For each of these two models, we evalu- 
ated the following measures of predictive accuracy before and after leave-one-out 
cross-validation (LOOCV): concordance (C)*° and time-dependent area under 
the receiver-operating characteristic curve (AUC)*”. The models trained on the 
validation and discovery cohorts were then cross-validated using the data from the 
other cohort. In view of the cross-validation results and close correlation between 
coefficients (Supplementary Table 3), we derived a model on the combined cohorts 
using both cohorts in order to achieve greater accuracy on the individual effects. 
Confidence intervals were calculated using 100 bootstrap samples. The coeffi- 
cients and performance metrics for each iteration of the model are included in 
Supplementary Table 3. 

Concordance measures were obtained using the survConcordance() function 
implemented in the survival R package**. Dynamic AUC was calculated with 
AUC.uno() implemented in the survAUC package. Time-independent AUCs were 
calculated using the performance function implemented in the ROCR package. 
The expected incidence of AML was calculated from the UK office of national 
statistics, available at http://www.cancerresearchuk.org/health-professional/ 
cancer-statistics/statistics-by-cancer-type/leukaemia-aml/incidence. All-cause 
mortality data was obtained from the office of national statistics (https://www.ons. 
gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectan- 
cies/datasets/nationallifetablesunitedkingdomreferencetables). 

Ridge-regularized logistic regression. Using the same covariates as in ‘Cox propor- 
tional hazards model with random effects, we fitted a ridge-regularized logistic 
regression model to dichotomised outcome data. While logistic regression is a 
common choice for case-control analyses, a downside of this approach is the 
inability to explicitly use time-dependent covariates. The penalty parameter was 
chosen using LOOCV on the full cohort; this value was then used on the discovery 
cohort and validation cohort to yield the same scaling of coefficients. Confidence 
intervals were calculated using 100 bootstrap samples. Fitting was performed using 
the glmnet R package. AUC as the primary performance metric was calculated 
using the ROCR R package. 

Additional regression models. Two alternative predictive models were developed. 
Model 1 performs logistic-regression-based predictions using four types of 
features: gender, age at blood sampling, the sum of the VAFs ARCH-PD reported 
in COSMIC v.80 to be recurrent (at least two case reports in haematopoietic and 
lymphoid tissues) and somatic mutation burden of selected genes, where each gene 
was represented by the sum of the VAFs corresponding to ARCH-PD mutations 
in that gene. We measured the predictive performance of each gene via the AUC 
obtained in a fivefold cross-validation when using only the gene as a predictive 
feature, and only retained genes with AUC > 55% in the final model. 

For model 2 we applied LASSO regression as implemented in the glmnet R 
package, while enabling LOOCYV to fit a Cox regression model. A minimal subset 
of ARCH-PD variants was selected for which the respective weighted combined 
VAFs were highly predictive of AML development in the training set. Scores were 
calculated for each patient as a linear combination of VAF of mutations weighted by 
regression coefficients that were estimated from the training data. As most scores 
were zero in the training subset, non-zero scores were discretized to take on a value 
of 1 that corresponds to AML prediction. 

Models 1 and 2 were trained on the discovery cohort and tested for their asso- 
ciation with AML development using the validation cohort data. Survival analysis 
was performed using the Kaplan-Meier and Cox proportional hazards models. 
Wald’s test was used to evaluate the significance of hazard ratios. Logistic regression 
models were used with the positive predictive value metric to determine the ability 
of various mutations and other patient parameters to predict AML development. 
The rms R package was used for logistic regression analysis, and the pROC 1.8 R 
package was used for receiver-operating characteristic curve analysis. 
AML-predictive model based on electronic health records. Clalit database. The 
Clalit database includes information from patients covered by the Clalit health 
services in Israel”® during the years 2002-2017. The Clalit training-set data, con- 
tains the electronic health records (EHR) of 3.45 million individuals per year on 
average. All data was anonymized through hashing of personal identifiers and 
addresses and randomization of dates by sampling a random number of weeks 
for each patient and adding it to all dates in the patient diagnoses, laboratory 
and medication records. This approach maintained differential data analysis per 
patient. Diagnoses codes were acquired from both primary care and hospitalization 
records, and were mapped to the ICD-9 coding system for historical reasons, with 
few exceptions that used a partial ICD-10 coding system. Laboratory records were 
normalized for age and gender by subtracting raw test values from the median 
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levels observed among all test values with matching gender and age (using a bin 
size of five years). We observed some chronological biases in laboratory ranges, 
but avoid normalizing these and instead insured case and controls are matched 
for chronological distributions. 

Defining AML cases. We screened for all active patients (18 < age < 100) who 
were diagnosed with AML (ICD-9 code 205.0*) between the years 2003 and 2016. 
We then excluded cases based on the following criteria. (1) We excluded patients 
with prior myeloid malignancies to omit secondary AML, consistent with the 
case selection for the genetic model. The following diagnosis were excluded if 
documented within five years before the diagnosis of AML: essential thrombo- 
cythemia (ICD-9 238.71), low-grade myelodysplastic syndrome (MDS) (ICD-9 
238.72); high-grade MDS lesions (ICD-9 238.73); MDS with 5q deletion (ICD-9 
238.74); MDS, unspecified (ICD-9 238.75); polycythemia vera (ICD-9 238.4); 
myelofibrosis (ICD-9 289.83); chronic myelomonocytic leukaemia (ICD-9 206.10- 
206.22). 

(2) Patients that had any procedures performed on bone marrow or spleen 
(ICD-10 code Z41) in the five-year period before first mention of AML diagnosis 
code in their record. These patients were presumed to have an inaccurate AML 
diagnosis date or misdiagnosis recorded. 

(3) Patients that received medications suggestive of an alternative diagnosis 
of chronic myeloid leukaemia, lymphoid malignancy or acute promyelocytic 
leukaemia (APL). At any time before diagnosis: imatinib, dasatinib, anagrelide, 
hydroxycarbamide, asparaginase, pegaspargase or arsenic trioxide. At any time 
after diagnosis: imatinib, dasatinib, methotrexate, tretinoin or arsenic trioxide. At 
any time after diagnosis, along with any acute lymphoblastic leukaemia diagnosis 
(ICD-9 204) or more than single dose: mercaptopurine. APL cases were excluded 
as early diagnosis of APL will most probably not change its outcome, as treatment 
is successful already. 

(4) Patients without a hospitalization record within three months before or after 
the onset diagnosis. This parameter was used as it is unlikely that a patient with 
AML will not be hospitalized close to diagnosis. This filter reduced false-positive 
cases and better defined the onset date. 

We refined the estimated time of onset using the earliest time at which any of the 
following diagnosis appeared in the patient’s history: amyloidosis (ICD-9 277.3), 
lymphoid leukaemia (ICD-9 204), myeloid leukaemia (ICD-9 205), leukaemia of 
unspecified cell type (ICD-9 208). 

This strategy retained 875 AML cases in the training set for further analysis. 
These were further validated by manual expert inspection of the complete records 
of 8% of the cases. 

To define the control set, we included all Clalit individuals that were not cases. 
Since our analysis was aggregating data from a historical time window of 15 years, 
we associated each control with a randomized time point for evaluation. Using this 
approach, both cases and controls represented a specific time point in the historical 
record of a patient, with matching calendric, age and gender distributions. Through 
this strategy 5,238,528 controls were used. 

Defining features for construction of a predictive a score. We extracted the follow- 
ing features for discriminative analysis of cases and controls (this procedure was 
applied repeatedly in cross-validation as discussed below). (1) Age (in years) at time 
point. (2) Gender. (3) Laboratory features. Out of 2,770 different types of labora- 
tory tests, we selected the top 50 most frequent laboratory tests (Supplementary 
Table 4). For each laboratory measurement, we used median age- and gender- 
normalized test values per patient in three time windows for 6-12 months before 
onset, 1-2 years before onset and 2-3 years before onset. In addition, we compute 
the slope of the normalized laboratory measurements for the 6-12 month time 
window using a linear regression model. (4) Diagnosis features. Of the 1780 differ- 
ent major ICD-9 diagnosis codes, we selected only diagnoses that were previously 
observed in at least 10 different cases and have an increased relative risk for AML 
>twofold (as observed in the training set, Supplementary Table 4). For each diag- 
nosis code, we mark whether it appeared in each of the patients in time intervals of 
6 months to 3 years, and 3-5 years before onset. (5) BMI features. For each patient 
in the cohort, we extracted median BMI, weight and height as measured in time 
intervals of 6 months to 2 years, and 2-3 years before onset. 

Gradient boosting. We used the R package xgboost to infer parameters for a clas- 
sifier given cases and controls. Objective was set to binary:logistic, the evaluation 
metric to AUC. We set nrounds = 5000, eta = 0.001, gamma = 0.1, lambda = 0.01, 
alpha = 0.01, max_depth = 6, min_child_weight = 2, subsample = 0.7 and col- 
sample_bytree = 0.7. The boosting algorithm reports a function f that computes 
a predictive score given the features. Given a threshold T the expression f(patient 
features) > T defines a classifier. To standardise thresholds we estimate quantiles 
for the scores on the training set T(p) = quantile(f(train),p) and define the clas- 
sifier for specificity level p as f(patient features) > T(p) (Supplementary Table 4). 

Cross-validation and relative risk evaluation. To evaluate the predictive value 
of the classification scheme while considering the strong age and gender biases in 
the incidence of AML, we performed fivefold cross-validation after splitting the 
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cases and controls into five age- and gender-matched groups. For each fold, we 
sampled 100,000 controls and combined with the cases, constructed the feature 
set and trained the model. The model was then tested on the fold cases along 
with 200,000 sampled controls. We used standardized classifier parameters and 
standardized thresholds that were inferred based on each training set to generate 
a series of classifications on each test set and merged these based on the control 
quantiles in the test as described above. Given a threshold p to define high and 
low prediction score, we counted for each bin b that defines a patient in a specific 
age (<40, 40-50, 50-60, 60-70, 70-80, >80) and gender group: the number of 
cases in bin b (N®.ase) and the number of controls in bin b (N° control) Where N? is 
the number of patients in bin b (entire database minus recall controls that are only 
a sample of the cohort). N°(case, high score) = N®+p indicates the number of true 
positives (TP); N°(case, low score) = N’ gy indicates the number of false negatives 
(FN); N®(control, high score) = N° ep indicates the number of false positives (FP); 
N®(control, low score) = N° py indicates number of true negatives (TN). 

For each age and gender group, the absolute risk for AML in the bin is com- 
puted by raps = N° case/N®. The absolute risk given a high score is estimated 
as ass high = Nbxp/(N’pp+ N°yp). The relative risk in the bin is defined by 
mr’ = 1? abs,high! 7’ abs Where the sensitivity level for the classifier threshold level is 
defined as sense’ = N°rp/N® case. 


TP x cases 
(TP + EN) 
TP x cases 
(TP + EN) 


FP x controls 
(FP + TN) 
cases 


r= 


cases+controls 


Clonal growth rate calculation. Individual clones were defined by different muta- 
tions in different study participants. Per clone we calculated a according to the 
following equation: 


a=log(V/%) /(T—T) 


where T and Tp indicate the age of the individual at the two measurement time 
points. Vand Vo correspond to the VAF at T and To, respectively. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. Code for derivation of the prediction model is publically availa- 
ble on Github (https://github.com/gerstung-lab/preAML). Code for the analysis of 
error-corrected sequencing is available from the Shlush lab upon request. 

Data availability. Targeted sequencing data for the discovery cohort are deposited 
as BAM files at the European Genome-phenome Archive (http://www.ebi.ac.uk/ 
ega/) under accession number EGAD00001003583. All other data are available 
from the corresponding authors upon reasonable request. Sequencing data for the 
validation cohort are deposited at the European Genome-phenome Archive with 
accession number EGAD00001003703. 
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Extended Data Fig. 1 | Prevalence of ARCH-PD mutations with VAF > 10% according to age. Red and blue lines represent the proportion of pre- 
AML cases and controls, respectively, that had ARCH-PD mutations with VAF > 10%. 
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Extended Data Fig. 2 | Serially collected sampling supports a long- 
lived HSPCs as the cell of origin for most ARCH-PD clones. a, b, VAF 
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different colour, with circles denoting individual serial samples and solid 
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dashed lines indicate the time interval between the last sampling and the 
end of follow-up (controls) or AML diagnosis (cases). c, Clonal growth 
rates (a) are shown for 27 control clones corresponding to 54 time points 
and 13 pre-AML clones corresponding to 15 time points. Box plot centres, 
hinges and whiskers represent the median, first and third quartiles and 
1.5 x interquartile range, respectively. 
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Extended Data Fig. 3 | Performance of the combined model in 
predicting progression to AML. a, Receiver operating characteristic curve 
for prediction of AML development using model 1 (see Methods). The red 
dot indicates the point on the curve with the highest positive predictive 
value with sensitivity of 41.9% and specificity of 95.7%. b, c, Kaplan-Meier 
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estimates of time to AML diagnosis for individuals predicted to develop 
AML (red) and not develop AML (blue) using model 1 (b; hazard ratio, 
10.38; P= 4.2 x 107!°, Wald test) and model 2 (c; hazard ratio, 10.75; 


P=1.75 x 10-8, Wald test), from the point of enrolment until the end of 
follow-up for patients enrolled in the EPIC study. 
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Extended Data Fig. 4 | AML predictive models. a—c, Time-dependent 
receiver operating characteristic curve for Cox proportional hazards 


model trained on the discovery cohort (n = 505 unique individuals, 


91 pre-AML and 414 controls) (a), validation cohort (n = 291 unique 
individuals, 29 pre-AML and 262 controls) (b) and combined cohorts (c). 


d-f, Dynamic AUC for Cox proportional hazards models trained on the 
discovery cohort (d), validation cohort (e) or combined cohort (f). 

g, h, Red and blue bars indicate the observed and expected VAF (g) 

and driver frequency (h) of pre-AML cases and controls for each gene 
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of the number of cases per 100,000 control individuals in the EHR 
database. The centre values and error bars define the mean and s.d., 
respectively. 
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cases with laboratory results either below the 1st percentile or above 
the 99th percentile. Box plot centres, hinges and whiskers represent the 
median, first and third quartiles and 1.5 x interquartile range, respectively. 


Extended Data Fig. 7 | Laboratory measurements contributing to the 
EHR model. Normalized laboratory measurements for pre-AMLs (red) 
and controls (blue) (middle) and their association (bottom) with higher 
risk of AML are shown. The grey bars indicate the percentage of pre-AML 
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Extended Data Fig. 8 | Top 50 parameters for the EHR model. The large unstained cells; LYM%, percentage of lymphocytes; LYMPH.abs, 
relative contribution of the top 50 features incorporated into the EHR absolute lymphocyte count; MACRO%, percentage of macrocytosis; MCH, 
prediction model, ranked according to their predictive value (gain). 1Y, mean corpuscular haemoglobin; MCV, mean corpuscular volume; MON%, 
one year before AML diagnosis; 2Y, two years before AML diagnosis; percentage of monocytes; MONO.abs, absolute monocyte count; MPV, 
3Y, three years before AML diagnosis; BASO%, percentage of basophils; mean platelet volume; NEUT.abs, absolute neutrophil count; NEUT%, 
BMI, body mass index; EOS.abs, absolute eosinophil count; EOS%, percentage of neutrophils; PLT, platelet count; RBC, red blood cell count; 
percentage of eosinophils; HYPO%, percentage of hypochromia; LUC, RDW, red cell distributiom width; WBC, white blood cell count. 
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Extended Data Fig. 9 | See next page for caption. 


TH AN 


A tt 


VoReL | 
'} | i) 


‘| if | 


it 


BILIRUBIN- DIRECT -2¥ 
BILIRUBIN TOTAL -3¥ 
BILIRUBIN TOTAL -2¥ 
URIC ACID- BLOOD -3Y 
URIC ACID- BLOOD -1Y 
URIC ACID- BLOOD -2¥ 
CREATININE pepsi ay 
CREATININE BLOO! 
CREATININE- BLOOD By 
UREA- BLOOD -3Y 


PHOS PHATASE- ALKALINE -3¥ 
PHOSPHATASE ALKALINE -2¥ 
PHOSPHATASE- ALKALINE -1¥ 
GAMMA-GLUTAMYL TRANSPEPTID. 


GLUTAMIC PYRUVIC TRANSAMINA 

NEUTS6 -1¥ 

NEUT% -3Y 

NEUTS% -2¥ 

VITAMIN B12-2¥ 

GLUTAMIC OXALOACETIC TRANSA 

GLUTAMIC PYRUVIC TRANSAMINA 

GLUTAMIC OXALOACETIC TRANSA 

GLUTAMIC PYRUVIC TRANSAMINA 

CALCIUM- BLOOD -1¥ 

CALCIUM- BLOOD -3Y¥ 

CALCIUM- BLOOD -2¥ 

PROTEIN-TOTAL-BLOOD -3Y 

PROT EIN-TOTAL-BLOOD -2Y 

ALBUMIN -1¥ 

ALBUMIN -3¥ 

ALBUMIN -2v 

CHOLESTEROL- HDL -3Y 

CHOLESTEROL- HDL -1¥ 

CHOLESTEROL- HDL -2¥ 
PHOSPHORUS- BLOOD -3¥ 

PHOSPHORUS- BLOOD -2Y 

SODIUM -3Y 

SODIUM -1¥ 

SODIUM -2¥ 

POTASSIUM -1¥ 

POTASSIUM -3Y¥ 

POTASSIUM -2¥ 


Y 
BASOPHILES (abs) -3¥ 
BASO % -3Y 
BASOPHILES fabs) -2Y 
BASO™S 
PLT -1¥ 
PLT -3¥ 
PLT -2¥ 
CHOLESTEROL -3Y 
CHOLESTEROL- Pa 3v 
CHOLESTEROL 
CHOLESTEROL- Tbe -2¥ 
NON-HDL_CHOLESTEROL -2Y 
CHOLESTEROL -1Y 
CHOLESTEROL- LDL -1¥ 
MCHC -1¥ 
HYPER -3Y 
HYPER -2Y 
MCHC -3¥ 
MCHC -2¥ 
GLUCOSE. BLOOD -2¥ 
GLUCOSE- BLOOD -3Y 
GLUCOSE BLOOD -1¥ 
TRIGLYCERIDES -1¥ 
TRIGLYCERIDES -3Y 
TRIGLYCERIDES -2Y¥ 


LYMP. 


LuCs% -1Y 


BASOPHILES fabs) -1¥ 
BASO%-1Y 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


LETTER 


Extended Data Fig. 9 | Distribution of EHR model parameters. Heat of lymphocytes; LYMP.abs, absolute lymphocyte count; MACRO%, 
map illustrating absolute values of clinical measurements. Blue, white and percentage of macrocytosis; MCH, mean corpuscular haemoglobin; 

red indicate low, intermediate and high values, respectively. Light grey MCHC, mean corpuscular haemoglobin concentration; MCV, mean 
indicates missing data. False-negative and true-positive annotations are corpuscular volume; MICR%, percentage of microcytosis; MON%, 
indicated at the bottom as dark-grey and yellow colour bars, respectively. percentage of monocytes; MONO.abs, absolute monocyte count; MPV, 
1Y, one year before AML diagnosis; 2Y, two years before AML diagnosis; mean platelet volume; PLT, platelet count; NEUT%, percentage of 

3Y, three years before AML diagnosis; BASO%, percentage of basophils; neutrophils; NEUT.abs, absolute neutrophil count; RBC, red blood cell 
EOS%, percentage eosinophils; EOS.abs, absolute eosinophil count; HCT, count; RDW, red cell distribution width; Transamina, transaminase; 
haematocrit; HDL; high density lipoprotein; HGB, haemoglobin; Hyper%, Transpeptid., transpeptidase; TSH, thyroid stimulating hormone; WBC, 
percentage of hyperchromia; Hypo%, percentage of hypochromia; LDL, white blood cell count. 


low density lipoprotein; LUC, large unstained cells; LYM%, percentage 
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Extended Data Table 1 | Genes sequenced by cRNA bait pull-down in the validation cohort 


LETTER 


cul? RADSI CBiC 
CoH23 iDH2 U2AF? 
MPL PTEN CREBBP ASXL1 
SMC3 SMG1 PTPRT 
HRAS CBB GNAS 
wri CTCF RUNXI 
SFI SMPO3 U2AFI 
MYB EED PRPF8 CSF2RB 
CNTNS 7P53 CBX? 
MLL NFA £P300 
CB Suzi2 ZRSR2 
IDH1 LUC7L2 ETV6 STAT5B BCOR 
CUL3 BRAF KRAS KANSL1 KDM6A 
GIGYF2 CUL1 MLL2 DCAF7 GATAI 
CBLB EZH2 PRPFAOB SRSF2 SMC1A 
GATA2 MLL3 PPFIA2 ASXL3 PHF8 
STAG1 RAD21 SH2B3 SETBP1 MED12 
PIK3CA MYC PTPN11 DNMT1 ATRX 
FRYL JAK2 FLT3 EPOR RPS6KAG 
KIT CDKN2A PDSS5B JAK3 DIAPH2 
UGT2A3 HNRNPK DCLK1 CEBPA STAG2 
TET2 NOTCH1 RB1 ZFP36 PHF6 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


https://doi.org/10.1038/s41586-018-0326-5 


Reprogramming human T cell function and 
specificity with non-viral genome targeting 


Theodore L. Roth!*+45, Cristina Puig-Saus°, Ruby Yu**", Eric Shifrut?:+°, Julia Carnevale’, P. Jonathan Li**5, 

Joseph Hiatt!++, Justin Saco®, Paige Krystofinski®, Han Li®’, Victoria Tobin**+°, David N. Nguyen**°, Michael R. Lee‘, 

Amy L. Putnam‘, Andrea L. Ferris!°, Jeff W. Chen", Jean-Nicolas Schickel", Laurence Pellerin!*¥, David Carmody", 

Gorka Alkorta-Aranburu", Daniela del Gaudio!’, Hiroyuki Matsumoto!®, Montse Morell!®, Ying Mao!®, Min Cho”, 

Rolen M. Quadros!®, Channabasavaiah B. Gurumurthy!®, Baz Smith'®, Michael Haugwitz'*, Stephen H. Hughes!?"), 

Jonathan S. Weissman®’, Kathrin Schumann***, Jonathan H. Esensten’’, Andrew P. May”, Alan Ashworth’, Gary M. Kupfer?°, 
Siri Atma W. Greeley", Rosa Bacchetta!*, Eric Meffre!', Maria Grazia Roncarolo’*’, Neil Romberg?’, Kevan C. Herold”, 
Antoni Ribas®2*>-26, Manuel D. Leonetti®:?:?8 & Alexander Marson?4::7:17,27% 


Decades of work have aimed to genetically reprogram T cells for 
therapeutic purposes’? using recombinant viral vectors, which 
do not target transgenes to specific genomic sites***. The need for 
viral vectors has slowed down research and clinical use as their 
manufacturing and testing is lengthy and expensive. Genome 
editing brought the promise of specific and efficient insertion of 
large transgenes into target cells using homology-directed repair®®. 
Here we developed a CRISPR-Cas9 genome-targeting system that 
does not require viral vectors, allowing rapid and efficient insertion 
of large DNA sequences (greater than one kilobase) at specific 
sites in the genomes of primary human T cells, while preserving 
cell viability and function. This permits individual or multiplexed 
modification of endogenous genes. First, we applied this strategy 
to correct a pathogenic IL2RA mutation in cells from patients 
with monogenic autoimmune disease, and demonstrate improved 
signalling function. Second, we replaced the endogenous T cell 
receptor (TCR) locus with a new TCR that redirected T cells toa 
cancer antigen. The resulting TCR-engineered T cells specifically 
recognized tumour antigens and mounted productive anti-tumour 
cell responses in vitro and in vivo. Together, these studies provide 
preclinical evidence that non-viral genome targeting can enable 
rapid and flexible experimental manipulation and therapeutic 
engineering of primary human immune cells. 

A major barrier to effective non-viral T cell genome targeting of 
large DNA sequences has been the toxicity of the DNA’. Although the 
introduction of short single-stranded oligodeoxynucleotide (ssODN) 
homology-directed repair (HDR) templates does not cause notable 
T cell toxicity, it has been shown that larger linear double-stranded 
DNA (dsDNA) templates are toxic at high concentrations*?. Contrary 
to expectations, we found that co-electroporation of human primary 
T cells with CRISPR-Cas9 ribonucleoprotein (RNP)'™!! complexes and 
long (>1 kb) linear dsDNA templates reduced the toxicity associated 
with the dsDNA template (Extended Data Fig. la-e). Cas? RNPs were 
co-electroporated with a dsDNA HDR template designed to introduce 


an N-terminal green fluorescent protein (GFP) fusion in the house- 
keeping gene RAB11A (Fig. 1a). Both cell viability and the efficiency 
of this approach were optimized by systematic exploration (Fig. 1b and 
Extended Data Fig. 1f-h), resulting in GFP expression in up to 50% 
of primary human CD4* and CD8* T cells. The method was repro- 
ducibly efficient with high cell viability (Fig. 1c-e). The system is also 
compatible with current manufacturing protocols for cell therapies. The 
method can be used with fresh or cryopreserved cells, bulk T cells or 
sub-populations sorted by fluorescence activated cell sorting (FACS), 
and cells from whole blood or leukapheresis (Extended Data Fig. 2a—d). 

We next confirmed that the system could be applied broadly by tar- 
geting sequences in different locations throughout the genome. We 
efficiently engineered primary T cells by generating GFP fusions with 
different genes (Fig. 2a and Extended Data Fig. 2e-g). Live-cell imaging 
with confocal microscopy confirmed the specificity of gene targeting, 
revealing the distinct sub-cellular locations of each of the resulting 
GFP-fusion proteins”? (Fig. 2b). Appropriate chromatin binding of a 
transcription factor GFP-fusion protein was confirmed by performing 
genome-wide CUT&RUN (cleavage under targets and release using 
nuclease)’? analysis with an anti-GFP antibody (Fig. 2c and Extended 
Data Fig. 2h). Finally, we showed that gene targeting preserved the 
regulation of the modified endogenous gene. Consistent with 
correct cell-type specific expression, a CD4—GFP fusion was selectively 
expressed in the CD4* population of T cells (Fig. 2d). Using HDR tem- 
plates encoding several fluorescent proteins, we demonstrated that we 
could generate cells with bi-allelic gene targeting (Fig. 2e and Extended 
Data Fig. 3a—d) or multiplex modification of two (Fig. 2f and Extended 
Data Fig. 3e-h) or even three (Fig. 2g and Extended Data Fig. 3i) differ- 
ent genes'*!°. These results show that several endogenous genes can be 
directly engineered without virus in T cells, and that gene and protein 
regulation are preserved. 

For therapeutic use of genetically modified T cells, integrated 
sequences should be introduced specifically without unintended dis- 
ruption of other critical genome sites'®. We performed targeted locus 
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Fig. 1 | Efficient non-viral genome targeting 
in primary human T cells. a, HDR-mediated 


51.4% 


integration of a GFP fusion tag to the 
housekeeping gene RAB11A. b, Development 
and optimization of non-viral genome targeting 
for both cell viability and HDR efficiency. 

c, Insertion of a GFP fusion into the 


endogenous RAB11A gene using non-viral 
targeting in primary human gated CD4* and 
CD8* T cells. HDRT, HDR template. d, Average 
efficiency with the RAB11A-GFP HDR 
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n= 12 independent healthy donors are shown 
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Fig. 1. 
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off-target integrations at the single-cell level by quantifying GFP* 
cells generated using a Cas9 RNP that cuts outside the homology site. 
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Fig. 2 | Individual and multiplexed modification of endogenous T cell 
genes. a, Non-viral genome targeting with GFP-fusion constructs into 
several endogenous genes. b, Confocal microscopy of live human T cells 
electroporated with the indicated HDR templates confirmed fusion- 
protein localization. Scale bars, 5 jum. c, GFP fused to the endogenous 
transcription factor BATF enabled genome-wide binding analysis 
(CUT&RUN) using anti-GFP or anti-BATF antibodies. d, RAB11A- 
fusions produced GFP-positive CD4+ and CD8* cells, whereas the CD4- 
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fusions were selectively expressed in CD4* cells. e, Bi-allelic non-viral 
genome targeting of two distinct fluorescent proteins into the same locus. 
f, Multiplexed non-viral genome targeting of HDR templates into two 
separate genomic loci. g, Simultaneous targeting of three distinct genomic 
loci. Cells positive for one (Q-II, Q-III) or two (Q-IV) integrations were 
highly enriched for a third HDR integration. BFP, blue fluorescent protein. 
One representative donor displayed from n =6 (a), n=4 (b, d-g), orn =2 
(c) independent healthy donors. See also Extended Data Figs. 2, 3. 
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Fig. 3 | Monogenic autoimmune mutations corrected by non-viral 
genome targeting. a, Pedigree of family with monogenic immune 

disease caused by compound heterozygous (het) mutations in IL2RA 
(Supplementary Table 4). b, Correction of c.530A>G IL2RA mutation 

by non-viral genome targeting in three compound heterozygous 

siblings rescued IL-2Ra cell surface expression on CD3* T cells 2 days 
after electroporation. AIN, autoimmune neutropenia; ITP, idiopathic 
thrombocytopenic purpura. c, Seven days after non-viral genome 
targeting, targeted unselected CD3* T cells showed increased STATS 
phosphorylation (pSTATS) levels after IL-2 stimulation compared to non- 
targeted controls. d, Non-viral genome targeting corrected the c.800delA 
mutation using D10A nickase and a long ssDNA HDR template. 

IL-2Ra surface expression was measured after 9 days of ex vivo expansion 
following electroporation (2 days after re-stimulation). n = 3 (b, c) or 

n= 1 (d) compound heterozygous patients per correction. See also 
Extended Data Figs. 6-8. 


found evidence to suggest that double-stranded templates could inte- 
grate independent of target homology!””®, albeit at low rates (Extended 
Data Fig. 4c—i). These rare events could be reduced almost completely 
by using single-stranded DNA (ssDNA) templates*!”” (Extended Data 
Fig. 5a—d). As an additional safeguard that could be important for some 
applications, we demonstrated that efficient non-viral T cell genome 
targeting also could be achieved using either a single-stranded or a double- 
stranded template with a Cas9 ‘nickase’ engineered to reduce potential 
off-target double-stranded cuts”*4 (Extended Data Fig. 5e—h). 
Having optimized this non-viral genome engineering approach in 
primary human T cells, we demonstrated its use in two different clin- 
ically relevant settings in which the targeted replacement of a gene 
would provide proof-of-principle that the method can be used to create 
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therapeutically relevant gene modifications. Specifically, we tested the 
ability to rapidly and efficiently correct an inherited genetic alteration 
in T cells, and we also tested the targeted insertion of the two chains 
of a TCR to redirect the specificity of T cells to recognize cancer cells. 

We identified a family with monogenic primary immune defi- 
ciency with autoimmune disease caused by recessive loss-of-function 
mutations in the gene encoding the IL-2a receptor (IL2RA)*° 
(Supplementary Table 4), which is essential for healthy regulatory 
T (Treg) cells”® (Extended Data Fig. 6a—h). Whole-exome sequenc- 
ing revealed that the IL2RA-deficient children contained compound 
heterozygous mutations in IL2RA (Fig. 3a and Extended Data Fig. 6i). 
One mutation, c.530A>G, creates a premature stop codon. With 
non-viral genome targeting, we were able to correct the mutation and 
observed IL-2Ra expression on the surface of corrected T cells from the 
patient (Fig. 3b). Long dsDNA templates led to efficient correction of 
the mutations. Because only two base pair changes were necessary (one 
to correct the mutation and one to silently remove the PAM sequence 
of the guide RNA (gRNA)), a short ssDNA (approximately 120 base 
pairs (bp)) could also be used to make the correction. These ssDNAs 
were able to correct the mutation at high frequencies, although here the 
efficiency of correction was lower than with the longer dsDNA template 
(Extended Data Figs. 7a, 8a). Correction was successful in T cells from 
all three siblings, but lower rates of IL-2Ra expression were seen in 
compound heterozygote 3, which could be due to altered cell-state asso- 
ciated with the patient's disease or the fact that she was the only sibling 
treated with immunosuppressive therapy (Supplementary Table 4 and 
Extended Data Fig. 8f). The second mutation identified, c.800delA, 
causes a frameshift in the reading frame of the final IL2RA exon. This 
frameshift mutation could be corrected both by HDR as well as by RNP 
cutting alone, presumably owing to some of the small indels restoring 
the reading frame (Extended Data Fig. 8). Together, these data show 
that distinct mutations can be corrected in patient T cells using HDR 
template-dependent and non-HDR template-dependent mechanisms. 

Mutation correction improved cell signalling function. After cor- 
rection of the c.530A>G IL2RA mutation, treatment with IL-2 led to 
increased phosphorylation of STATS, a hallmark of productive signal- 
ling (Fig. 3c and Extended Data Figs. 7c, 8c). In addition, after correc- 
tion, we found that the modified T cells expressed both IL-2Ra and 
FOXP3, a crucial transcriptional factor in Tyeg cells (Extended Data 
Figs. 7d, 8d). We were also able to correct the IL2RA mutation in a 
sorted population of CD3*CD4*CD127°TIGIT*CD45RO* Tyeg-like 
cells from a patient (Extended Data Fig. 7e, f), a strategy that could 
potentially be used in a gene-modified cell therapy for the children in 
this family. Cell-type specific and stimulus responsive expression of 
IL2RA is under tight control by multiple endogenous cis-regulatory 
elements that constitute a super-enhancer”””®, Therefore, effective thera- 
peutic correction of the IL2RA defect is likely to depend on repairing 
the gene in its endogenous genomic locus; off-target effects should be 
avoided. We therefore demonstrated that the c.800delA mutation could 
also be repaired using Cas9 nickase combined with a single-stranded 
HDR template (Fig. 3d). 

Non-viral genome targeting not only allows the correction of point 
mutations, but also enables integration of much larger DNA sequences. 
We were able to use a large DNA construct to rapidly reprogram the 
antigen specificity of human T cells, which is critical for many cellular 
immunotherapy applications. Recent work demonstrates that chimaeric 
antigen receptors (CARs) have enhanced efficacy when they are genet- 
ically encoded in the endogenous TCR locus using CRISPR-Cas9 gene 
cutting and an adeno-associated virus vector as a repair template’. 
Targeting of specific TCR sequences to this locus is a more challenging 
problem because T cells must express paired TCR alpha (TCR-«) and 
beta chains (TCR-8) to make a functional receptor. 

We developed a strategy to replace the endogenous TCR using 
non-viral genome targeting to integrate an approximately 1.5-kb 
DNA cassette into the first exon of the TCR-a constant region (TRAC) 
(Fig. 4a). This cassette encoded the full-length sequence of a TCR-8 
separated by a self-excising 2A peptide from the variable region of a 
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Fig. 4 | Replacement of the endogenous 
TCR by non-viral genome targeting. 

a, Schematic of HDR template used to 
replace the endogenous TCR. b, Non-viral 
genome targeting successfully replaced 

the endogenous TCR with the NY-ESO-1 
antigen-specific 1G4 TCR. c, Antigen-specific 
cytokine production and degranulation 

in CD8* T cells with the replaced TCR. 

d, Antigen-specific target cell killing by CD8* 
T cells with the replaced TCR. e, Melanoma 
tumour mouse xenograft model. NSG mice, 
non-obese diabetic (NOD)/severe combined 
immunodeficiency (SCID)/Il2rg~/~ mice. 

f, Scalability of non-viral replacement 

of the endogenous TCR for adoptive cell 
therapy. g, Preferential in vivo localization 

of NY-ESO-1 TCR* T cells to the tumour. 

h, Tumour growth after adoptive transfer of 
NY-ESO-1 TCR* non-virally or lentivirally 
modified or vehicle alone (saline). One 
representative donor from n=6 (b) or n=2 
(c, d) independent healthy donors, with mean 
and s.d. of technical triplicates (c, d). n=6 (f) 
or n=2 (g, h) independent healthy donors in 
5 (g) or 7 mice (h) with mean and s.d. (f-h). 
*E P< 0.01, ***P < 0.001, ****P < 0.0001 
(two-way analysis of variance (ANOVA) with 
Holm-Sidak’s multiple comparisons test). See 
also Extended Data Figs. 9, 10. 


new TCR-a, which encodes the full TCR-« sequence when appro- 
priately integrated at the endogenous TRAC exon (Extended Data 
Fig. 9a—d). To test this strategy, we introduced a TCR-8 and TCR-a 
pair (1G4) that recognizes the NY-ESO-1 tumour antigen”? into the 
TRAC locus of polyclonal T cells isolated from healthy human donors. 
Antibody staining for total TCR-a/8 expression and NY-ESO-1-MHC 
dextramer staining for the NY-ESO-1 TCR expression revealed that 
non-viral genome targeting enabled reproducible replacement of the 
endogenous TCR in both CD8* and CD4t primary human T cells 
(Fig. 4b and Extended Data Fig. 9k). NY-ESO-1 TCR cells could also 
be generated with a similar targeting strategy at the TCR-8 constant 
region (TRBC1/2) or with multiplexed simultaneous replacement of 
both endogenous TCR-a and TCR-$ (Extended Data Fig. 9e-i). Most 
of the T cells that did not express NY-ESO-1 TCR were TCR knock- 
outs (Fig. 4b), presumably due to non-homologous end joining (NHEJ) 
events induced by the Cas9-mediated double-stranded DNA breaks 
in TRAC exon 1. Up to around 70% of resulting TCR-positive cells 
recognized the NY-ESO-1 dextramer. 

Next, we assessed the tumour antigen-specific function of targeted 
human T cells. When the targeted T cells were co-cultured with two 
different NY-ESO-1* melanoma cell lines, M257 and M407, the mod- 
ified T cells robustly and specifically produced IFNy and TNF and 
induced T cell degranulation (measured by CD107a surface expression) 
(Fig. 4c). Cytokine production and degranulation only occurred when 
the NY-ESO-1 TCR T cells were exposed to cell lines expressing the 
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appropriate human leukocyte antigen (HLA)-A*0201 class I major his- 
tocompatibility complex (MHC) allele required to present the cognate 
NY-ESO-1 peptide to the TCR. Both the CD8t and CD4¢ T cell 
response was consistent across healthy donors, and was comparable 
to the response of T cells from the same healthy donor in which the 
NY-ESO-1 TCR was transduced by gamma retrovirus and hetero- 
logously expressed using a viral promoter (Fig. 4c and Extended Data 
Fig. 9j). NY-ESO-1 TCR knock-in T cells rapidly killed target M257- 
HLA-A*0201 cancer cells in vitro at rates similar to the positive control, 
retrovirally transduced T cells (Fig. 4d). Killing was selective for target 
cells expressing NY-ESO-1 antigen and the HLA-A*0201 allele, con- 
sistent across donors, and depended on the T cells being modified using 
both the correct gRNA and HDR template (Extended Data Fig. 9n-q). 

Finally, we confirmed that non-viral genome targeting could be used 
to generate NY-ESO-1 TCR cells at scale and that these cells have in 
vivo anti-tumour function (Fig. 4e and Extended Data Fig. 10a). Given 
that knock-in efficiency was lower with non-viral targeting than with 
comparable sized adeno-associated virus templates” , we first wanted to 
ensure that we could generate sufficient numbers of NY-ESO- 1-positive 
cells for adoptive cell therapies. We electroporated 100 million T cells 
from six healthy donors, which after ten days of expansion yielded an 
average of 385 million NY-ESO-1 TCR T cells per donor (Fig. 4f and 
Extended Data Fig. 9i-m). NY-ESO-1 TCR knock-in T cells preferen- 
tially localized to, persisted at, and proliferated in the tumour rather 
than the spleen, similar to positive control lentivirally-transduced 
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T cells (Fig. 4g and Extended Data Fig. 10b-f). Adoptive transfer of 
sorted NY-ESO-1 TCR T cells also reduced the tumour burden in 
treated animals (Fig. 4h). 

Our therapeutic gene editing in human T cells is a process that takes 
only a short time from target selection to production of the genetically 
modified T cell product. In approximately one week, novel gRNAs and 
DNA repair templates can be designed, synthesized, and the DNA inte- 
grated into primary human T cells that remain viable, expandable, and 
functional. The whole process and all required materials can be easily 
adapted to good manufacturing practices for clinical use. Avoiding the 
use of viral vectors will accelerate research and clinical applications, 
reduce the cost of genome targeting, and potentially improve safety. 

Looking forward, the technology could be used to ‘rewire’ complex 
molecular circuits in human T cells. Multiplexed integration of large 
functional sequences at endogenous loci should allow combinations 
of coding and non-coding elements to be corrected, inserted, modi- 
fied and rearranged. Much work remains to be done to improve our 
understanding of endogenous T cell circuitry if we are going to create 
synthetic circuits. Rapid and efficient non-viral tagging of endogenous 
genes in primary human cells will facilitate live-cell imaging and pro- 
teomic studies to decode T cell programs. Non-viral genome targeting 
provides an approach to re-write these programs in cells for the next 
generation of immunotherapies. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0326-5. 


Received: 19 November 2017; Accepted: 4 June 2018; 
Published online 11 July 2018. 


1. Sadelain, M., Riviére, |. & Riddell, S. Therapeutic T cell engineering. Nature 545, 
423-431 (2017). 

2. Lim, W.A. & June, C. H. The principles of engineering immune cells to treat 
cancer. Cell. 168, 724-740 (20 

3. Rosenberg, S. A. & Restifo, N. P. Adoptive cell transfer as personalized 
immunotherapy for human cancer. Science 348, 62-68 (2015). 

4. Verhoeyen, E., Costa, C. & Cosset, F.-L. Lentiviral vector gene transfer into human 
T cells. Methods Mol. Biol. 506, 97-114 (2009). 

5. Eyquem, J. et al. Targeting a CAR to the TRAC locus with CRISPR/Cas9 
enhances tumour rejection. Nature 543, 113-117 (2017). 

6. Hale, M. et al. Homology-directed recombination for enhanced engineering of 
chimeric antigen receptor T cells. Mol. Ther. Methods Clin. Dev. 4, 192-203 (2017). 

7. Cornu, T.|., Mussolino, C. & Cathomen, T. Refining strategies to translate 
genome editing to the clinic. Nat. Med. 23, 415-423 (2017). 

8. Zhao, Y. et al. High-efficiency transfection of primary human and mouse T 
lymphocytes using RNA electroporation. Mol. Ther. 13, 151-159 (2006). 

9. Hornung, V. & Latz, E. Intracellular DNA recognition. Nat. Rev. Immunol. 10, 
123-130 (2010). 

0. Kim, S., Kim, D., Cho, S. W., Kim, J. & Kim, J. S. Highly efficient RNA-guided 
genome editing in human cells via delivery of purified Cas9 ribonucleoproteins. 
Genome Res. 24, 1012-1019 (2014). 

1. Schumann, K. et al. Generation of knock-in primary human T cells using Cas9 
ribonucleoproteins. Proc. Natl Acad. Sci. USA 112, 10437-10442 (2015). 

2. Leonetti, M. D., Sekine, S., Kamiyama, D., Weissman, J. S. & Huang, B. A scalable 
strategy for high-throughput GFP tagging of endogenous human proteins. Proc. 
Natl Acad. Sci. USA 113, E3501-E3508 (2016). 

3. Skene, P. J. & Henikoff, S. An efficient targeted nuclease strategy for high- 
resolution mapping of DNA binding sites. eLife 6, 21856 (2017). 

4. Bak, R. O. et al. Multiplexed genetic engineering of human hematopoietic stem 
and progenitor cells using CRISPR/Cas9 and AAV6. eLife 6, e27873 (2017). 

5. Agudelo, D. et al. Marker-free coselection for CRISPR-driven genome editing in 
human cells. Nat. Methods 14, 615-620 (2017). 

6. Lux, C. T. & Scharenberg, A. M. Therapeutic gene editing safety and specificity. 
Hematol. Oncol. Clin. North Am. 31, 787-795 (2017). 

7. Cain-Hom, C. et al. Efficient mapping of transgene integration sites and local 
structural changes in Cre transgenic mice using targeted locus amplification. 
Nucleic Acids Res. 45, e62 (2017). 

8. Dever, D. P. et al. CRISPR/Cas9 8-globin gene targeting in human 
haematopoietic stem cells. Nature 539, 384-389 (2016). 


NI 
LS 


LETTER 


19. Murnane, J. P, Yezzi, M. J. & Young, B. R. Recombination events during 
integration of transfected DNA into normal human cells. Nucleic Acids Res. 18, 
2733-2738 (1990). 

20. Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9 mediated homology- 
independent targeted integration. Nature 540, 144-149 (2016). 

21. Quadros, R. M. et al. Easi-CRISPR: a robust method for one-step generation of 
mice carrying conditional and insertion alleles using long ssDNA donors and 
CRISPR ribonucleoproteins. Genome Biol. 18, 92 (2017). 

22. Li,H. et al. Design and specificity of long ssDNA donors for CRISPR-based 
knock-in. Preprint at https://doi.org/10.1101/178905 (2017). 

23. Mali, P., Esvelt, K. M. & Church, G. M. Cas9 as a versatile tool for engineering 
biology. Nat. Methods 10, 957-963 (2013). 

24. Ran, F.A. et al. Double nicking by RNA-guided CRISPR Cas9 for enhanced 
genome editing specificity. Ce// 154, 1380-1389 (2013). 

25. Sharfe, N., Dadi, H. K., Shahar, M. & Roifman, C. M. Human immune disorder 
arising from mutation of the alpha chain of the interleukin-2 receptor. Proc. Nat! 
Acad. Sci. USA 94, 3168-3171 (1997). 

26. Sakaguchi, S., Sakaguchi, N., Asano, M., Itoh, M. & Toda, M. Immunologic 
self-tolerance maintained by activated T cells expressing IL-2 receptor 
alpha-chains (CD25). Breakdown of a single mechanism of self-tolerance 
causes various autoimmune diseases. J. /mmunol. 155, 1151-1164 (1995). 

27. Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune 
disease variants. Nature 518, 337-343 (2015). 

28. Simeonov, D. R. et al. Discovery of stimulation-responsive immune enhancers 
with CRISPR activation. Nature 549, 111-115 (2017). 

29. Robbins, P. F. et al. Single and dual amino acid substitutions in TCR CDRs can 
enhance antigen-specific T cell functions. J. /mmunol. 180, 6116-6131 (2008). 


Acknowledgements We thank members of the Marson laboratory, 

C. Jeans, K. Marchuk, J. Bluestone, Q. Tang, R. Wagner, the UCSF Biological 
Imaging Development Center, the UCSF Parnassus Center for Advanced 
Technology, the UCSF Parnassus Flow Cytometry Core (NIH P30 DK063720 
and 1$100D021822-01), Lonza, J. Corn and S. Pyle for suggestions and 
assistance. This research was supported by NIH grants DP3DK111914-01 
(A.M.), P50GM082250 (A.M.), R35 CA197633 (A.R.), K23 DKO94866 (S.W.G.), 
T32GM007618 (T.LR., J.H.), T32 DKOO7418 (T.L.R.), and P30 DKO20595 
(S.W.G.), the NIH NCI Intramural Program (A.L.F., S.H.H.), grants from the Keck 
Foundation (A.M.), National Multiple Sclerosis Society (A.M.; CA 1074-A-21), 
gifts from J. Aronov, G. Hoskin, the Jeffrey Modell Foundation (A.M), and awards 
from the Burroughs Wellcome Fund (A.M.) and the Ressler Family Fund (C.PS., 
J.S., A.R.). A.M. is a Chan Zuckerberg Biohub investigator. A.R. is a Parker 
Institute for Cancer Immunotherapy member. 


Reviewer information Nature thanks M. Maus, J. Wherry and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


Author contributions T.L.R. and A.M. designed the study and wrote the 
manuscript. T.L.R. designed and performed all electroporation experiments. 
T.LR., RY. E.S., J.L, J.H., V.T., D-M.N. and K.S. contributed to functional assays 

of edited T cells. R.Y. performed and analysed CUT&RUN experiments. H.L., 
J.W. and M.D.L. developed the IVT-RT ssDNA production method. H.M., M.M., 
Y.M., B.S. and M.H. developed the exonuclease-based ssDNA production 
method. R.Q. and C.G. discussed the use of ssDNA. A.M.F. and S.H.H. advised on 
methods of DNA introduction into T cells. T.L.R., E.S., M.C. and A.P.M. performed 
amplicon sequencing. J.C., J.N.S., A.L.P, LP, D.C., GAA., D.D.G., G.M.K., S.W.G., 
R.B., E.M., M.G.R., N.R. and K.C.H. contributed to the clinical workup of /L2RA- 
deficient family and functional assays on unedited patient T cells. J.H.E. and 
M.R.L. performed TSDR analysis. T.L.R., C.P.S., E.S., A.R. and A.M. designed the 
endogenous TCR knock-in strategy. T.L.R., C.P.S., J.C., J.S., P.K., A.A. and AR. 
performed or supervised in vitro assays of T cells with endogenous TCR knock- 
ins. T.LR. designed and performed all mouse experiments. 


Competing interests A.M. is a co-founder of Spotlight Therapeutics. A.M. serves 
as an advisor to Juno Therapeutics and is a member of the scientific advisory 
board of PACT Pharma. The Marson laboratory has received sponsored research 
support (Juno Therapeutics, Epinomics, Sanofi) and a gift from Gilead. A.R. is 
co-founder and a member of the scientific advisory board of PACT Pharma. 
T.LR., C.P.S., E.S., A.R. and A.M. are inventors on new patent applications related 
to this manuscript (US patent application no. 62/520,117, T.L.R. and A.M.; US 
patent application no. 62/578,153, T.LR., C.PS., E.S., A.R. and A.M.). 


Additional information 

Extended data is available for this paper at https://doi.org/10.1038/s41586- 
018-0326-5. 

Supplementary information is available for this paper at https://doi.org/ 
10.1038/s41586-018-0326-5. 

Reprints and permissions information is available at http://www.nature.com/ 
reprints. 

Correspondence and requests for materials should be addressed to A.M. 
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional 
claims in published maps and institutional affiliations. 


19 JULY 2018 | VOL 559 | NATURE | 409 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
For all in vivo experiments, experimental conditions were allocated randomly at 
the time of adoptive transfer, and experimental conditions were mixed among 
littermates. For in vivo tumour sizing experiments, the investigator was blinded 
to experimental condition. No power analysis was used to determine sample sizes. 
Antibodies. All antibodies used in the study for fluorescence activated cell sort- 
ing, flow cytometry and cellular stimulations are listed in Supplementary Table 2. 
Guide RNAs. All gRNAs used in the study are listed in Supplementary Table 3. 
Isolation of human primary T cells for gene targeting. Primary human T cells 
were isolated from healthy human donors either from fresh whole blood, resid- 
uals from leukoreduction chambers after Trima Apheresis (Blood Centers of the 
Pacific), or leukapheresis products (StemCell). Peripheral blood mononuclear cells 
(PBMCs) were isolated from whole blood samples by Ficoll centrifugation using 
SepMate tubes (STEMCELL, per manufacturer's instructions). T cells were isolated 
from PBMCs from all cell sources by magnetic negative selection using an EasySep 
Human T Cell Isolation Kit (STEMCELL, per manufacturer’s instructions). Unless 
otherwise noted, isolated T cells were stimulated as described below and used 
directly (fresh). When frozen cells were used, previously isolated T cells that had 
been frozen in Bambanker freezing medium (Bulldog Bio) per manufacturer's 
instructions were thawed, cultured in media without stimulation for 1 day, and 
then stimulated and handled as described for freshly isolated samples. Fresh blood 
was taken from healthy human donors under a protocol approved by the UCSF 
Committee on Human Research (CHR #13-11950). Patient samples used for gene 
editing were obtained under a protocol approved by the Yale Human Investigation 
Committee (HIC). Additional leukapheresis products from healthy donors were 
collected either under UCLA Institutional Review Board (IRB) approval #10- 
001598 or purchased from AllCells, LLC. All patients and healthy donors provided 
informed consent. 

Primary human T cell culture. Unless otherwise noted, bulk T cells were cul- 
tured in XVivol5 medium (STEMCELL) with 5% fetal bovine serum (FBS), 50 1M 
2-mercaptoethanol, and 10 |1M N-acetyl L-cystine. Immediately after isolation, 
T cells were stimulated for 2 days with anti-human CD3/CD28 magnetic dynabeads 
(ThermoFisher) at a beads to cells concentration of 1:1, along with a cytokine cock- 
tail of IL-2 at 200 U ml! (UCSF Pharmacy), IL-7 at 5 ng ml! (ThermoFisher), and 
IL-15 at 5 ng ml"! (Life Tech). After electroporation, T cells were cultured in media 
with IL-2 at 500 U ml. Throughout the culture period T cells were maintained 
at an approximate density of 1 million cells per ml of media. Every 2-3 days after 
electroporation, additional media was added, along with additional fresh IL-2 to 
bring the final concentration to 500 U ml’, and cells were transferred to larger 
culture vessels as necessary to maintain a density of 1 million cells per ml. 

RNP production. RNPs were produced by complexing a two-component gRNA 
to Cas9, as previously described". In brief, crfRNAs and tracrRNAs were chem- 
ically synthesized (Dharmacon, IDT), and recombinant Cas9-NLS, D10A-NLS, 
or dCas9-NLS were recombinantly produced and purified (QB3 Macrolab). 
Lyophilized RNA was resuspended in 10 mM Tris-HCL (7.4 pH) with 150 mM 
KCl at a concentration of 160 1M, and stored in aliquots at —80°C. crRNA and 
tracrRNA aliquots were thawed, mixed 1:1 by volume, and annealed by incuba- 
tion at 37°C for 30 min to form an 80 1M gRNA solution. Recombinant Cas9 or 
the D10A Cas9 variant were stored at 40 .M in 20 mM HEPES-KOH, pH 7.5, 
150 mM KCl, 10% glycerol, 1 mM DTT, were then mixed 1:1 by volume with the 
80 |1M gRNA (2:1 gRNA to Cas9 molar ratio) at 37°C for 15 min to form an RNP 
at 20 1M. RNPs were electroporated immediately after complexing. 
Double-stranded DNA HDRT production. Novel HDR sequences were con- 
structed using Gibson Assemblies to insert the HDR template sequence, consisting 
of the homology arms (commonly synthesized as gBlocks from IDT) and the 
desired insert (such as GFP) into a cloning vector for sequence confirmation and 
future propagation. These plasmids were used as templates for high-output PCR 
amplification (Kapa Hotstart polymerase). PCR amplicons (the dsDNA HDRT) 
were SPRI purified (1.0) and eluted into a final volume of 3 j1l H2O per 100 jl 
of PCR reaction input. Concentrations of HDRTs were determined by nanodrop 
using a 1:20 dilution. The size of the amplified HDRT was confirmed by gel elec- 
trophoresis in a 1.0% agarose gel. All homology directed repair template sequences 
used in the study, both dsDNA and ssDNA, are listed in Supplementary Table 3. 
Single-stranded DNA HDRT production by exonuclease digestion. To produce 
long ssDNA as HDR templates, the DNA of interest was amplified via PCR using 
one regular, non-modified PCR primer and a second phosphorylated PCR primer. 
The DNA strand that will be amplified using the phosphorylated primer will be 
the strand that will be degraded using this method. This makes it possible to pre- 
pare either a single-stranded sense or single-stranded antisense DNA using the 
respective phosphorylated PCR primer. To produce the ssDNA strand of interest, 
the phosphorylated strand of the PCR product was degraded by treatment with 
two enzymes, Strandase Mix A and Strandase Mix B, for 5 min (per 1 kb) at 37°C, 
respectively. Enzymes were deactivated by a 5 min incubation at 80°C. The result- 


ing ssDNA HDR templates were SPRI purified (1.0X) and eluted in HO. A more 
detailed protocol for the Guide-it Long ssDNA Production System (Takara Bio, 
632644) can be found at the manufacturer’s website. 

Single-stranded DNA HDRT production by reverse synthesis. ssDNA HDR 
templates were synthesized by reverse transcription of an RNA intermediate fol- 
lowed by hydrolysis of the RNA strand in the resulting RNA:DNA hybrid product, 
as described”. In brief, the desired HDR donor was first cloned downstream of 
a T7 promoter and the T7-HDR donor sequence amplified by PCR. RNA was 
synthesized by in vitro transcription using HiScribe T7 RNA polymerase (New 
England Biolabs) and reverse-transcribed using TGIRT-III (InGex). After reverse 
transcription, NaOH and EDTA were added to 0.2 M and 0.1 M, respectively, and 
RNA hydrolysis was carried out at 95°C for 10 min. The reaction was quenched 
with HCl, the final ssDNA product purified using Ampure XP magnetic beads 
(Beckman Coulter) and eluted in sterile RNase-free H2O. ssDNA quality was ana- 
lysed by capillary electrophoresis (Bioanalyzer, Agilent). 

Primary T cell electroporation. RNPs and HDR templates were electroporated 2 days 
after initial T cell stimulation. T cells were collected from their culture vessels and mag- 
netic anti-CD3/anti-CD28 dynabeads were removed by placing cells on an EasySep 
cell separation magnet for 2 min. Immediately before electroporation, de-beaded cells 
were centrifuged for 10 min at 90g, aspirated, and resuspended in the Lonza electro- 
poration buffer P3 using 20 \1l buffer per 1 million cells. For optimal editing, 1 million 
T cells were electroporated per well using a Lonza 4D 96-well electroporation system 
with pulse code EH115. Alternate cell concentrations from 200,000 up to 2 million cells 
per well resulted in lower transformation efficiencies. Alternate electroporation buffers 
were used as indicated, but had different optimal pulse settings (EO155 for OMEM 
buffer). Unless otherwise indicated, 2.5 11 RNPs (50 pmol total) were electroporated, 
along with 2 jul HDR Template at 2 jig jul’ (4 ug HDR template total). 

The order of cell, RNP and HDRT addition appeared to matter (Extended Data 
Fig. 1). For 96-well experiments, HDRTs were first aliquoted into wells of a 96-well 
polypropylene V-bottom plate. RNPs were then added to the HDRTs and allowed 
to incubate together at room temperature for at least 30 s. Finally, cells resuspended 
in electroporation buffer were added, briefly mixed by pipetting with the HDRT 
and RNP, and 24 ul of total volume (cells plus RNP and HDRT) was transferred 
into a 96-well electroporation cuvette plate. Immediately after electroporation, 
80 1l of pre-warmed media (without cytokines) was added to each well, and cells 
were allowed to rest for 15 min at 37°C ina cell culture incubator while remaining in 
the electroporation cuvettes. After 15 min, cells were moved to final culture vessels. 
Flow cytometry and cell sorting. Flow cytometric analysis was performed on 
an Attune NxT Acoustic Focusing Cytometer (ThermoFisher) or an LSRII flow 
cytometer (BD). FACS was performed on the FACSAria platform (BD). Surface 
staining for flow cytometry and cell sorting was performed by pelleting cells and 
resuspending in 25 11 of FACS buffer (2% FBS in PBS) with antibodies at the indi- 
cated concentrations (Supplementary Table 2) for 20 min at 4°C in the dark. Cells 
were washed once in FACS buffer before resuspension. 

Confocal microscopy. Samples were prepared by drop casting 10 1 of a solution 
of suspended live T cells onto a 3 x 1 inch (7.6 x 2.5 cm) microscope slide onto 
which a 25 mm?’ coverslip was placed. Imaging was performed on an upright con- 
figuration Nikon A1r laser scanning confocal microscope. Excitation was achieved 
through a 488 nm OBIS laser (Coherent). A long working distance (LWD) 60x 
Plan Apo 1.20 numerical aperture (NA) water immersion objective was used with 
additional digital zoom achieved through the NIS-Elements software. Images 
were acquired under ‘Galvano mirror settings with 2 x line averaging enabled 
and exported as TIFF to be analysed in FIJI (ImageJ, NIH). 

CUT&RUN. CUT&RUN was performed using epitope-tagged primary human 
T cells 11 days after electroporation and 4 days after re-stimulation with anti-CD3/ 
anti-CD28 dynabeads (untagged cells were not electroporated). Approximately 
20% and 10% of electroporated cells showed GFP-BATF expression as determined 
by flow cytometry in donor 1 and donor 2 samples, respectively. CUT&RUN was 
performed as described!, using anti-GFP (ab290), anti-BATF (sc-100974), and 
rabbit anti-mouse (ab46540) antibodies. In brief, 6 million cells (30 million cells for 
anti-GFP CUT&RUN in GFP-BATF-containing cells) were collected and washed. 
Nuclei were isolated and incubated rotating with primary antibody (GFP or BATF) 
for 2h at 4°C. BATF CUT&RUN samples were incubated for an additional hour 
with rabbit anti-mouse antibody. Next, nuclei were incubated with proteinA- 
micrococcal nuclease (provided by the Henikoff laboratory) for 1 h at 4°C. Nuclei 
were equilibrated to 0°C and MNase digestion was allowed to proceed for 30 
min. Solubilized chromatin CUT&RUN fragments were isolated and purified. 
Paired-end sequencing libraries were prepared and analysed on Illumina Nextseq 
machines and sequencing data were processed as described". For peak calling and 
heatmap generation, reads mapping to centromeres were filtered out. 

TLA sequencing and analysis. TLA sequencing was performed by Cergentis as 
previously described”. Similarly, data analysis of integration sites and transgene 
fusions was performed by Cergentis as previously described!’. TLA sequencing 
was performed in two healthy donors, each edited at the RAB11A locus with either 
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a dsDNA or ssDNA HDR template to integrate a GFP fusion (Fig. 1b). Sequencing 
reads showing evidence of primer dimers or primer bias (that is, greater than 99% 
of observed reads came from single primer set) were removed. 

In vitro T,eg cell suppression assay. CD4* T cells were enriched using the 
EasySep Human CD4¢ T cell enrichment kit (STEMCELL Technologies). 
CD3+CD4*+CD127"CD45RO*TIGIT* enriched Treg-like cells from IL2RA- 
deficient subjects and healthy donors as well as CD3+CD4tIL-2Ra®CD127” 
Treg Cells from IL2RA*’~ heterozygous individuals were sorted by flow cytometry. 
CD3*CD47IL-2Ra”CD127* responder T (Tresp) cells were labelled with CellTrace 
CFSE (Invitrogen) at 5 |1M. Treg cells and healthy donor Tresp cells were co- 
cultured at a 1:1 ratio in the presence of beads loaded with anti-CD2, anti-CD3 and 
anti-CD28 (Treg Suppression Inspector; Miltenyi Biotec) at a 1 bead: 1 cell ratio. On 
days 3.5-4.5, co-cultures were analysed by FACS for CFSE dilution. The percentage 
inhibition is calculated using the following formula: 1 — (% proliferation with Treg 
cells/% proliferation of stimulated Tyesp cells without Treg cells). 

Sorting and TSDR analysis of corrected Treg cells. Ex vivo expanded T,.g and T 
effector cells from a healthy control and a patient with IL2RA compound hete- 
rozygous mutations (D6) were thawed and stained. Live cells were sorted based on 
expression of CD25 and CD62L markers directly into ZymoResearch M-digestion 
Buffer (2x) (D5021-9) supplemented with proteinase K. The lysate was incubated at 
65°C for greater than 2 h and then frozen. Bisulfite conversion and pyrosequencing 
of the samples was performed by EpigenDx (assay ID ADS783-FS2) to interrogate 
the methylation status of 9 CpG sites intron 1 of the FOXP3 gene, spanning —2330 
to —2263 from ATG. 

Generation of retrovirally and lentivirally transduced control T cells. For 
retroviral infections, clinical grade MSGV-1-1G4 (NY-ESO-1 TCR transgene) 
retroviral vector (IUVPC) was used. For lentiviral production, HEK293 cells were 
plated at 18 million cells in 15 cm dishes the night before transfection. Cells were 
transfected using the Lipofectamine 3000 reagent following the manufacturer's 
protocol (L3000001). Transfection media was changed the following day to fresh 
HEK293 media (DMEM with 5% FBS and 1% penicillin/streptromycin) with viral 
boost reagent per the manufacturer’s protocol at 500 x (Alstem viral boost reagent 
VB100). Forty-eight hours after transfection, the viral supernatant was collected, 
filtered and the Alstem precipitation solution was added, mixed, and refriger- 
ated at 4°C for 4 h, concentrated by centrifugation, and the viral pellet was then 
resuspended at 100 in cold PBS following the manufacturer’s protocol (lentivirus 
precipitation solution VC100). 

T cells for viral infection were activated similarly to non-virally edited cells. 
Both retroviral and lentiviral transductions occurred 48 h after TCR/cytokine 
stimulus, followed by expansion in IL-2 similarly to non-virally edited cells. For 
retroviral transduction, T cells were infected by spinoculation in retronectin-coated 
(Clontech) plates. Control mock-transduced T cells were also generated. For lenti- 
viral transduction, viral concentrate was added to 1 x final concentration. 
Antigen-specific TCR expression analysis. The expression of the NY-ESO-1 TCR 
was assessed in virally and non-virally modified cells with an NY-ESO-1-specific 
(SLLMWITQC) dextramer-PE (Immundex) according to the manufacturer's pro- 
tocol. Negative dextramer (Immudex) was used as a negative control. 

T cell activation and cytokine production analysis. Melanoma cell lines were 
established from the biopsies of melanoma patients under the UCLA IRB approval 
11-003254. Cell lines were periodically screened for mycoplasma contamination 
as well as authenticated using GenePrint 10 System (Promega), and were matched 
with the earliest passage cell lines. M257 (NY-ESO-1* HLA-A*0201~), M257-A2 
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(NY-ESO-1* HLA-A*0201*) and M407 (NY-ESO-1* HLA-A*0201*) were co- 
cultured 1:1 with the modified PBMCs in cytokine free media. The recommended 
amount per test of CD107a-APC-H7 (Supplementary Table 2) antibody was added 
to the co-culture. After 1 h, half the recommended amount of BD Golgi Plug and 
BD Golgi Stop (BD Bioscience) was added to the coculture. After 6 h, surface stain- 
ing was performed followed by cell permeabilization using BD cytofix/cytoperm 
(BD Bioscience) and intracellular staining according to manufacturer instructions 
(Supplementary Table 2). Negative dextramer and fluorescence minus one staining 
were used as controls. 

T cell in vitro killing assay. M202-nRFP (NY-ESO-1~, HLA-A*0201*), 
M257-nRFP (NY-ESO-1+ HLA-A*0201~), M257-A2-nRFP (NY-ESO-1+ 
HLA-A*0201*), M407-nRFP (NY-ESO-1* HLA-A*0201*), and A375-nRFP 
(NY-ESO-1* HLA-A*0201*) melanoma cell lines stably transduced to express 
nuclear RFP (Zaretsky 2016 NEJM) were seeded approximately 16 h before starting 
the co-culture (~1,500 cells seeded per well). Modified T cells were added at the 
indicated E:T ratios. All experiments were performed in cytokine free media. Cell 
proliferation and cell death was measured by nRFP real time imaging using an 
IncuCyte ZOOM (Essen) for 5 days. 

In vivo mouse solid tumour model. All mouse experiments were completed 
under a UCSF Institutional Animal Care and Use Committee protocol. We used 
8-12-week-old NSG male mice (Jackson Laboratory) for all experiments. Mice 
were seeded with tumours by subcutaneous injection into a shaved right flank 
of 1 x 10° A375 human melanoma cells (ATCC CRL-1619). At seven days after 
tumour seeding, tumour size was assessed and mice with tumour volumes between 
15-30 mm? were randomly assigned to experimental and control treatment groups. 
Indicated numbers of T cells were resuspended in 100 iil of serum-free RPMI and 
injected retro-orbitally. For tumour sizing experiments, the length and width of 
the tumour was measured using electronic calipers and volume was calculated as 
v=1/6 x x x length x width x (length + width)/2. The investigator was blinded 
to experimental treatment group during sizing measurements. A bulk edited 
T cell population (5 x 10°) or a sorted NY-ESO-1 TCR* population (3 x 10°) was 
transferred as indicated in figures and legends. For bulk edited T cell transfers, 
lentivirally edited cells generally had a higher percentage of NY-ESO-1TCR* cells, 
so mock-infected cells were added to normalize the percentage of total T cells 
NY-ESO-1 TCR* to equal that of the bulk population of non-virally edited T cells 
(~10% NY-ESO-1 TCR‘). For sorted T cell transfers, NY-ESO-1 TCR* T cells were 
FACS sorted eight days after electroporation, expanded for two additional days, 
and frozen (Bambanker freezing medium, Bulldog Bio). Non-virally or lentivirally 
modified human T cells were then thawed and rested in media overnight before 
adoptive transfer. For flow cytometric analysis of adoptively transferred T cells, 
single-cell suspensions from tumours and spleens were produced by mechanical 
dissociation of the tissue through a 70-.m filter. All animal experiments were 
performed in compliance with relevant ethical regulations per an approved ACUC 
protocol (UCSF), including a tumour size limit of 2.0 cm in any dimension. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. CUT&RUN data have been deposited in the Gene Expression 
Omnibus (GEO) under accession GSE108600. TLA and amplicon sequencing 
data are available upon request. Source data for animal experiments (Fig. 4g, h 
and Extended Data Fig. 10) are provided. Plasmids containing the HDR template 
sequences used in the study are available through AddGene. All other data are 
available from the corresponding author upon reasonable request. 
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Extended Data Fig. 1 | Development of non-viral genome targeting 

in primary human T cells. a, Except where noted otherwise, ‘viability’ 
refers to the number of live cells in an experimental condition (expressed 
as a percentage) relative to an equivalent population that went through all 
protocol steps except for the actual electroporation (no electroporation 
control). ‘Efficiency’ refers to the percentage of live cells in a culture 
expressing the ‘knocked-in’ exogenous sequence (such as GFP). Finally, the 
total number of cells positive for the desired modification was calculated 
by multiplying the efficiency by the absolute cell count. Methodological 
changes that maximized efficiency were not always optimal for the total 
number of positive cells, and vice-versa. b, dsDNA, both circular (plasmid) 
and linear, when electroporated into primary human T cells, caused 
marked loss in viability with increasing amounts of template. 

Co-delivery of an RNP caused less reduction in viability post 
electroporation. Notably, no loss in viability was seen with ssODNs. 

c, RNPs must be delivered concurrently with DNA to see increased 
viability. T cells from two donors were each electroporated twice with an 

8 h rest in between electroporations. Although two closely interspersed 
electroporations caused a high degree of cell death, delivery of the RNP 
and linear dsDNA template could be delivered separately. Initial RNP 
electroporation did not protect from the loss of viability if dsDNA was 
delivered alone in the second round of electroporation. d, We determined 
whether the order of adding reagents influenced targeting efficiency 

and viability. In wells in which the RNP and the DNA HDR template 

were mixed together before adding the cells (1. RNP + HDRT; 

2. + Cells), there was a marked increase in targeting efficiency. e, Note, 
with the high concentration of dsDNA used in these experiments, viability 
was higher if the RNP and cells were mixed first and the DNA template 
was added immediately before electroporation (1. RNP + Cells; 

2. + HDRT). Taken together, these data suggest that pre-incubation of the 
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RNP and HDR template, even for a short period, increased the amount 

of DNA HDR template delivered into the cell, which increased efficiency 
but decreased viability. However, viability after RNP and dsDNA HDR 
template pre-incubation was still higher than was observed with dsDNA 
HDR template electroporation by itself (b). dsDNA HDR temple (5 1g) 
was used in c-e. f, Primary human T cells were cultured for 2 days using 
varying combinations of anti-CD3/CD28 TCR stimulation and cytokines 
before electroporation of RAB11A targeting RNP and HDR template, 
followed by varying culture conditions after electroporation. g, Among the 
RNP and HDR template concentrations tested here, optimal GFP 
insertion into RAB11A was achieved at intermediate concentrations 

of the RNP and dsDNA HDRT. h, Arrayed testing of electroporation 

pulse conditions showed that, in general, conditions yielding higher HDR 
efficiency decreased viability. EH115 was selected to optimize efficiency, 
while still maintaining sufficient viability. i, Diagrammatic timeline of 
non-viral genome targeting. Approximately one week is required to 
design, order from commercial suppliers, and assemble any novel 
combination of genomic-editing reagents (gRNA and the HDR template). 
Two days before electroporation, primary human T cells isolated 

from blood or other sources (Extended Data Fig. 2) are stimulated. 
dsDNA HDR templates can be made easily by PCR followed by a SPRI 
purification to achieve a highly concentrated and pure product suitable for 
electroporation. On the day of electroporation, the gRNA (complexed with 
Cas9 to form an RNP), the HDR template, and collected stimulated 

T cells are mixed and electroporated, a process taking approximately 1.5 h. 
After electroporation, engineered T cells can be readily expanded for an 
additional 1-2 weeks. Viability was measured 2 days after electroporation 
and GFP expression was measured at day 4. Graphs display mean (b, ¢, g, h) 
and/or individual donor values (b-h) in n = 2 independent healthy donors 
(b-h). For d, e and h, one representative donor is shown. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Donor 1 
60, mmDonor 2 
Donor 3 
Donor 4 


Efficiency (% GFP+) 


No Data 
No Data 


No Data 
No Data 


ul 


Sorting 


PBMCs 


Bulk CD3+ 


CD8+ 


cD4+ 


Tregs 


T Cell Isolation (CD4+) 


a b c 
Donor 1 Donor 1 mm Donor 1 
60 , mi Donor 2 Donor 2 40.» Donor 2 
Donor 3 60 
[Donor 4 
> F 30 
a Fa & 
g 40 & 40 8 
& 2 = 
= = = 20 
5 2 5 
2” ed E 
Ww 5 10 
i) 0 v) 
Whole Blood Leukapheresis Fresh Frozen Negative Selection 
T Cell Source T Cell Handeling 
e 
4054 
307 
ra ™™ Donor 1 
a wm Donor 2 
oO 207 Donor 3 
zs Donor 4 
Donor 
10-4 Donor 6 
o 
DR RAB11A-GFP CLTA-GFP TUBA1B-GFP ACTB-GFP FBL-GFP 
Template: 


8 
L 


mm Donor 1 


Donor 6 


HDR 
Template: 
CUT&RUN 


Antibody: 


Occupancy at BATF peaks 
in untagged cells, 


Cell Type Edited 


BATF-GFP 


| 4ou0g 


—s 


RAB11A-GFP 


CLTA-GFP 


TUBA1B-GFP 


ACTB-GFP 


FBL-GFP 


Donor 6 


GFP Expression Level 
(MFI % of Max) 

8 8&8 8 8 8 

je ee 


abe 


FBL-GFP 


Le 


CLTA-GFP 


oilman 


TUBAIB-GFP ACTB-GFP 


RAB11A-GFP 


HDR 
Template 


Extended Data Fig. 2 | Non-viral genome targeting is consistent across 
T cell types and reproducible across target loci. a, Efficient genome 
targeting was accomplished with a variety of T cell processing and 
handling conditions that are used with current manufacturing protocols 
for cell therapies. Non-viral genome targeting of a RAB11A-GFP fusion 
protein using a linear dsDNA HDR template was performed in bulk CD3* 
T cells isolated from either whole blood draws or by leukapheresis. 

b, Targeting was similar either using bulk CD3* T cells fresh after 
isolation or after cryopreservation (stored in liquid nitrogen and thawed 
before initial activation). c, CD4* T cells isolated by FACS showed 
detectable GFP* cells indicative of efficient editing, albeit at lower rates 
than targeting in CD4 cells isolated by negative selection (potentially 
due to the added cellular stress of sorting). d, Using the same optimized 
non-viral genome targeting protocol (Methods), a variety of T cell types 
could be successfully edited, including peripheral blood mononuclear 
cells, without any selection (T cell culture conditions cause preferential 
growth of T cells from PBMCs). Sorted T cell subsets (CD8*, CD4*, and 
CD4*IL-2RatCD127" Treg cells) could be successfully targeted with GFP 
integration. PBMCs were cultured for 2 days identically to primary 

T cells (Methods). Bulk CD3* T cells were isolated by negative 
enrichment. The electroporations in d used only 2 jg of dsDNA HDR 
template, a concentration that was later found to be less efficient than the 
final 4 j1g (contributing to the lower efficiencies seen compared to Fig. 1d). 
RAB11A-GFP template was used with on-target gRNA was used in a-d. 
e, Four days after electroporation of different GFP templates along with a 
corresponding RNP into primary CD3* T cells from six healthy donors, 
GFP expression was observed across both templates and donors. f, High 
viability after electroporation was similarly seen across target loci. g, The 
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fusion tagged proteins produced by integrating GFP into specific genes 
localized to the subcellular location of their target protein (Fig. 2b), and 
were also expressed under the endogenous gene regulation, allowing 
protein expression levels to be observed in living primary human 

T cells. Note how GFP tags of the highly expressed cytoskeletal proteins 
TUBALB (beta tubulin) and ACTB (beta actin) showed consistently 
higher levels of expression compared to the other loci targeted across 

six donors. GFP mean fluorescent intensity (MFI) was calculated for the 
GFP* cells in each condition/donor, and normalized as a percentage of 
the maximum GFP MFI observed. h, Gene fusions not only permitted 
the imaging and analysis of expression of endogenous proteins in live 
cells, but also could be used for biochemical targeting of specific proteins. 
For example, chromatin-immunoprecipitation followed by sequencing 
(ChIP-seq), and more recently CUT&RUN, have been widely used to 
map transcription factor-binding sites; however, these assays are often 
limited by the availability of effective and specific antibodies. As a proof- 
of-principle, we used anti-GFP antibodies to perform CUT&RUN analysis 
in primary T cells in which the endogenous gene encoding the crucial 
transcription factor BATF had been targeted to generate a GFP-fusion. 
Binding sites identified with anti-GFP CUT&RUN closely matched 

the sites identified with an anti-BATF antibody. Anti-BATE, anti-GFP 
and no-antibody heat maps of CUT&RUN data obtained from primary 
human T cell populations electroporated with GFP-BATF fusion HDR 
template (untagged cells were not electroporated). Aligned CUT&RUN 
binding profiles for each sample were centred on BATF CUT&RUN peaks 
in untagged cells and ordered by BATF peak intensity in untagged cells. 
Experiment in h was performed in two independent healthy donors. 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Bi-allelic and multiplexed non-viral genome 
targeting. a, We wanted to confirm that we could generate cells with 
genome insertions in both alleles and quantify the frequency of bi-allelic 
modifications. Targeting the two alleles of the same gene with two distinct 
fluorophores would provide a way to quantify and enrich cells with 
bi-allelic gene modifications. The possible cellular phenotypes and 
genotypes when two fluorescent proteins are inserted into the same 

locus are displayed. Importantly, the number of cells that express both 
fluorescent proteins underestimates the percentage of cells with bi- 

allelic integrations because some cells will have inserted either GFP or 
mCherry on both alleles. We constructed a model to account for bi-allelic 
integrations of the same fluorescent protein (Supplementary Note 1). 

b, Diagram of bi-allelic integration model. The total percentage of cells 
with bi-allelic HDR integrations must be the sum of genotypes D, E and 

E Although the proportion of cells with genotype E (dual fluor positives) 
is immediately apparent from the phenotypes, genotypes D and F are 

not. Our model allow for the de-convolution of the multiple genotypes in 
the single fluor positive phenotypes, and thus an estimation of the true 
percentage of cells bi-allelic for HDR. c, The observed level of bi-allelic 
integrations was higher in cells that acquired at least one integration than 
would be expected by chance. Individual points represent replicates where 
the combination of the genes encoding the fluorescent proteins was varied 
(either GFP plus mCherry, GFP plus BFP, or mCherry plus BFP) as was 
the amount of the HDR template (3-6 1g). d, Bi-allelic HDR analysis was 
applied across a variety of fluorophore permutations inserted into the 
RAB11A locus. e, Dual fluorescence bi-allelic integrations were seen across 
target loci. f, The data also suggest that cells with one integration were 
more likely to have also undergone a second targeted bi-allelic integration, 
and this effect was observed across three genomic loci. While the total 
percentage of cells with an insertion varied with the efficiency of each 
target site, the fold enrichment in the observed percentage of homozygous 
cells over that predicted by random chance was largely consistent across 
loci. g, Co-delivery of three fluorescent tags targeting the RABI1A 

locus resulted in only a few cells that expressed all three fluorophores, 
consistent with a low rate of off-target integrations. As a maximum of two 


targeted insertions are possible (at the two alleles of the locus; assuming 

a diploid genome), no cells positive for all three loci should be observed 
(triple positives). Indeed, while large numbers of single fluorophore 
integrations were observed (single positives), as well as cells positive 

for the various permutations of two fluorophores (double positives), 
there was an approximately 30-fold reduction in the number of triple 
positive cells compared to double positives. All flow cytometric analysis 
of fluorescent protein expression shown here was performed 4 days after 
electroporation. h, Multiplex editing of combinatorial sets of genomic 
sites would support expanded research and therapeutic applications. 

We tested whether multiple HDR templates could be co-delivered along 
with multiple RNPs to generate primary cells in which more than one 
locus was modified. Primary human T cells with two modifications were 
enriched by gating on the cells that had at least one modification, and this 
effect was consistent across multiple combinations of genomic loci. HDR 
template permutations from a set of six dsDNA HDR templates (targeting 
RAB11A, CD4 and CLTA; each site with GFP or RFP) were electroporated 
into CD3* T cells isolated from healthy human donors. Four days after 
electroporation of the two indicated HDR templates along with their 

two respective on-target RNPs, the percentage of cells positive for each 
template was analysed by gating on cells either positive or negative for 
the other template. Not only was two-template multiplexing possible 
across a variety of template combinations, but gating on cells positive for 
one template (template 1+ cells, blue) yielded an enriched population of 
cells more likely to be positive for the second template compared to cells 
negative for the first (template 1— cells, black). 2 pg of each template, 
along with 30 pmol of each associated RNP, were electroporated for dual 
multiplexing experiments. i, We also achieved triple gene targeting and 
could enrich for cells that had a third modification by gating on the cells 
with two targeted insertions, an effect again consistent across target 
genomic loci. 1.5 jug of each template (4.5 jg total) were electroporated 
together with 20 pmol of each corresponding RNP (60 pmol total). Graphs 
display mean and s.d. in n= 4 (f-i) independent healthy donors. Other 
experiments (c-e) were performed in two independent healthy donors. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Examination of off-target integrations with non- 
viral genome targeting. a, Results of targeted locus amplification (TLA) 
sequencing. No off-target integration sites were identified (assay's limit of 
detection ~1% of alleles) with either a dsDNA or ssDNA HDR template 
in two healthy donors. The on-target RAB11A locus on chromosome 

15 is indicated in red. b, The frequency of one of the observed incorrect 
integrations at the target locus was reduced using a long ssDNA HDR 
template in two human blood donors (Supplementary Note 2). c, Diagram 
of HDR-mediated insertions at the N terminus of a target locus. The 
homology arms specify the exact sequence where the insert (a GFP tag in 
this case) will be inserted, allowing for scarless integration of exogenous 
sequences. Because a GFP fusion protein is created, GFP fluorescence 
will be seen as a result of the on-target integration, which is dependent 
on an RNP cutting adjacent to the integration site. d, dsDNA can be 
integrated via homology-independent repair mechanisms at off-target 
sites through either random integration at naturally occurring dsDNA 
breaks, or potentially at induced double-stranded breaks, such as those 
at the off-target cut sites of the RNP. This effect can be harnessed to 
allow for targeted integration of a dsDNA sequence at a desired induced 
dsDNA break in quiescent cell types which lack the ability to do HDR, 
but crucially the entire sequence of the dsDNA template is integrated, 
including any homology arms. In the case that the homology arms 
contain a promoter sequence (such as for N-terminal fusion tags), these 
off target integrations can drive observable expression of the inserted 
sequence without the desired correct HDR insertion. e, We looked for 
unintended non-homologous integrations with the non-viral system 
using an N-terminal GFP-RAB11A fusion construct that contained the 
endogenous RAB11A promoter sequence within its 5’ homology arm. 
This construct could express GFP at off-target integration sites, which 
allowed us to assay for off-target events at the single-cell level using flow 
cytometry. Inclusion of a gRNA designed to cut a genome region that is 
not the homologous region to the targeting sequence can be used to infer 
integration at an off-target cut site. f, Although efficient GFP expression 


depended on pairing the HDR template with the correct gRNA targeting 
that site, rare GFP* cells were observed when dsDNA HDR templates were 
delivered either alone (~0.1%) or with an off-target Cas9 RNP (~1%). 

g, Quantification of different types of functional off-target integrations. 
The increase in the percentage of fluorescent cells over the limit of 
detection when the template alone is electroporated probably represents 
random integrations at naturally occurring dsDNA breaks (although cut- 
independent integration at the homology site is also possible in theory). 
Not every off-target integration will yield fluorescent protein expression 
(for example, only part of the template sequence could be integrated 

or it could be integrated in a way that does not lead to measurable 
expression), but the relative differences in functional off-target expression 
between different templates and editing conditions can be assayed. 
Inclusion of an RNP targeting CXCR4 (off-target) markedly increased 
the observed off-target homology-independent integrations, probably 

by a homology-independent insertion event. As expected, efficient GFP 
expression as expected was only seen with the correct gRNA sequence 
and HDR-mediated repair. Bars represent observed GFP* percentages 
from T cells from one representative donor electroporated with the 
indicated components. h, Comparisons of on-target GFP expression 
versus functional off-target integrations across five templates reveal HDR 
is highly specific, but that off-target integrations can be observed at low 
frequencies. i, A matrix of gRNAs and HDR templates were electroporated 
into bulk T cells from two healthy donors. The average GFP expression in 
gated CD4* T cells as a percentage of the maximum observed for a given 
template is displayed. Across six unique HDR templates and gRNAs, on- 
target HDR-mediated integration was the by far most efficient. One HDR 
template, a C-terminal GFP fusion tag into the nuclear factor FBL, had 
consistently higher off-target expression across gRNAs, potentially due 

to a gene-trap effect as the 3’ homology arm for FBL contains a splice-site 
acceptor followed by the final exon of FBL leading into the GFP fusion. 
n=2 (a,b, h, i) or n=8 (e, f) independent healthy donors. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Non-viral genome targeting using long ssDNA 
HDR templates and a Cas9 nickase. a, Long ssDNA templates have 
potential to reduce homology-independent integrations while preserving 
on-target efficiency. One method to generate long ssDNA templates 
involves a two-step selective exonuclease digestion that specifically 
degrades one strand of a PCR product that has been labelled by 5’ 
phosphorylation, which can be easily added to a PCR primer before 
amplification. b, We also applied a second ssDNA production method 
based on sequential in vitro transcription (IVT) and reverse transcription 
(RT) reaction. A PCR product with a short T7 promoter appended serves 
as an IVT template to produce a ssRNA product. After annealing of an 
RT primer and reverse transcription, an RNA-DNA hybrid can form, 
which then can be transformed into a long ssDNA template by incubation 
in sodium hydroxide, which selectively degrades the RNA strand. c, At 4 
days after electroporation, varying concentrations of a long ssDNA HDR 
templates (~1.3 kb) did not show the decreased viability observed in 
CD3* T cells electroporated with a linear dsDNA HDR template of the 
same length. d, Electroporation of a ssDNA HDR template reduced off- 
target integrations to the limit of detection (that is, comparable to levels 
seen with no template electroporated) both with no nuclease added and at 
induced off-target dsDNA breaks (off-target gRNA + Cas9). e, Diagram 
of the genomic locus containing the first exon of RAB11A. Use of spCas9 
with an individual guide RNA (gRNA 1, ‘on-target’ in d) along with a 
dsDNA HDR template integrating a GFP in frame with RABI1A directly 


after the start codon results in efficient GFP expression (Fig. 1d). Use ofa 
Cas9 nickase (D10A variant) with two gRNAs may reduce the incidence 
of off-target genome cutting. f, A series of individual gRNAs as well as 
dual gRNA combinations were tested for GFP insertion efficiency at the 
RAB11A N-terminal locus. As expected, no gRNAs showed appreciable 
levels of GFP insertion when using a nuclease dead Cas9 (dCas9). Multiple 
individual gRNAs that cut adjacent to the insertion site showed GFP 
integration when used with Cas9, but none were as efficient as gRNA 1. 
The D10A nickase showed little to no GFP integration with individual 
guides, but multiple two-guide combinations showed efficient GFP 
integration. Only in gRNA combinations where the two PAM sequences 
were directed away from each other (PAM Out) was GFP integration 
seen. g, GFP integration efficiencies as presented in f but graphed ona 
logarithmic scale reveal lower levels of functional off-target integrations 
when using the D10A nickase compared to spCas9 (with an individual 
off-target gRNA, targeting CXCR4), probably due to the requirement for 
the D10A nickase to have two gRNAs bound in close proximity to induce 
a dsDNA break. h, Long ssDNA templates (~1.3 kb) could be successfully 
combined with Cas9 nickases (D10A) for targeted integration, similar to 
linear dsDNA templates. Here, long ssDNA HDR templates with D10A 
nickase showed lower efficiencies of GFP integration at the RABJ1A site. 
n=2 (c,d, f, g) or n=3 (h) independent healthy donors with mean 

(c, d, f-h) and s.d. (h). 
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Extended Data Fig. 6 | Reduced Treg cell frequencies and function in 
subjects with two loss-of-function IL2RA mutations. a, CD4* 

T cells from a healthy donor and all family members, including IL2RA 
heterozygotes (c.530 het 1, c.800 hets 1-3) as well as compound 
heterozygous children (comp. hets 1-3), with loss-of-function IL2RA 
mutations were analysed by flow cytometry to assess the presence of 
IL-2RabiCD127" Teg cells. b, In healthy donors and individuals with 
only one IL2RA mutation, CD4*FOXP3* T cells are predominantly IL- 
2Ra'CD127". In the compound heterozygotes, a CD127!°CD4*FOXP3+ 
population is present, but does not express high levels of IL-2Ra. 

c, Clinical phenotyping performed at two separate sites showed that 
compound heterozygotes have CD127!°FOXP3* cells. d, Deficiency 

in IL-2Ra surface expression in compound heterozygote 3 led to 
aberrant downstream signalling as measured by pSTAT5 expression 

after stimulation with IL-2, but not IL-7 or IL-15. e, Owing to the 
inability to sort IL-2Rabi Treg Cells from the IL-2Ra-deficient compound 
heterozygotes, FOXP3* cells were enriched from CD4* using an alternate 
gating strategy that used the surface markers CD127'°CD45RO'TIGIT*. 
Intracellular FOXP3 staining of T cells from the indicated gated population 
is shown. f, Although these CD3+CD4+CD127"CD45ROtTIGITt 
potential Tye, cells were highly enriched for FOXP3 and showed some 
suppressive capacity when cultured with CFSE-labelled stimulated 

Tresp cells from healthy donors, CD3+CD4+CD127"CD45ROtTIGIT+ 
from the compound heterozygotes did not show suppressive ability. 
Stimulated Tyesp cell population (solid curves), non-stimulated Tyesp cells 
(dashed curve). g, Correction of either IL2RA mutation in the compound 
heterozygotes individually would still leave the other mutation, leaving 
the cells as single heterozygotes. To confirm that such a potential 
correction would result in some level of functional suppression, we 
assessed the suppressive ability of CD4*IL-2Ra%CD127" Tyeg cells 


from the c.530 and c.800 single heterozygote family members as in f. h, 
Dot plot summaries of T;eg cell suppressive ability in cells from healthy 
donors (n = 3 with single (top) or 12 (bottom) technical replicates), 
IL2RA-deficient compound heterozygotes (f, 2 =3 total human subjects) 
and IL2RA +/— c.530 or c.800 heterozygotes (g, n = 4 total human 
subjects). Although CD3*CD4*CD127"°CD45RO*TIGIT* Treg cells from 
compound heterozygotes showed no suppressive ability, conventional 
CD4*IL-2Ra*iCD127" Treg cells from the single heterozygote family 
members showed some suppressive capacity, consistent with their lack of a 
pronounced clinical phenotype compared to the compound heterozygotes. 
Thus, correcting functional IL-2Ra expression on the surface of FOXP3* 
T cells from these patients may represent a viable approach for developing 
an ex vivo gene therapy. Mean value is displayed. i, Initial genetic testing 
of the proband (Supplementary Note 3) using an in-house targeted next- 
generation sequencing multi-gene panel of over 40 genes known to be 
involved in monogenic forms of diabetes was negative. Subsequent exome 
sequencing in the trio of proband and parents revealed two causative 
mutations in the IL2RA gene. The mother possessed a single heterozygous 
mutation (c.530G>A) in exon 4 of IL2RA, resulting in a premature stop 
codon. The father possessed a single heterozygous mutation (c.800delA) in 
exon 8 of IL2RA, resulting in a frameshift mutation leading to a 95 amino 
acid long run-on. Sanger sequencing confirmed that the proband was a 
compound heterozygote with both mutations. A gRNA was designed to 
cut adjacent to the site of each mutation, 8 bp away for c.530 mutation 
(blue), and 7 bp away for c.800 (red). For each mutation, an HDR template 
was designed including the corrected sequence (green) as well as a silent 
mutation in a degenerate base to disrupt the PAM sequence (NGG) for 
each guide RNA. Displayed genomic regions (not to scale) for c.530 
mutation site (hg38 ch10:6021526-6021557) and c800 mutation site (hg38 
ch10:6012886-6012917). 
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Extended Data Fig. 7 | HDR-mediated correction of IL2RA c.530A>G 
loss-of-function mutation. a, Unlike the gRNA targeting the c.800delA 
mutation at the C terminus of IL-2Ra (Extended Data Fig. 8), the 

gRNA targeting the c.530A>G mutation (causing a stop codon in an 
interior exon) results in substantial (~90%) loss of IL-2Ra cell surface 
expression in a healthy donor and single heterozygotes (c.800 het 2 and 3) 
2 days after electroporation of the RNP alone (blue) into CD3* T cells. 
Although starting from a very small IL-2Ra* percentage, this reduction 
was observed in all three compound heterozygotes, potentially because 

a small amount of protein can be surface expressed from the c.800delA 
allele. This reduced IL-2Ra expression could be partially rescued by 
inclusion of an ssODN HDR template (green) and even more substantially 
rescued using a large ds DNA HDR template (yellow). Both template types 
contained the corrected sequence, a silent mutation to remove the gRNA 
PAM sequence, and either 60 bp (ssODNs) or ~300 bp (large dsDNA) 
homology arms (Extended Data Fig. 6i). b, Amplicon sequencing of the 
c.530 site in select patients shows the correlation between IL-2Ra cell 
surface expression and genomic correction. Small numbers of reads in 
the ‘no electroporation and ‘RNP only’ conditions were called as HDR, 
potentially owing to small amounts of cross-well contamination. 

c, Increased pSTATS in response to IL-2 stimulation (200 U ml~') 7 days 
after electroporation in CD3* T cells from compound heterozygote 
patients undergoing HDR-mediated mutation correction compared to 

no electroporation or RNP only controls. pSTAT5* cells correlated with 
increased IL-2Ra surface expression. d, Similarly, increased proportions 
of IL-2RatFOXP3* cells are seen 9 days after electroporation in the 
HDR correction conditions in compound heterozygote patients. Lower 
percentages of correction were seen when targeting the c.530 mutation 


for HDR correction in compound heterozygote 3, potentially due 

altered cell-state associated with the patient's disease or the patient’s 
immunosuppressive drug regimen (Supplementary Table 4). e, Mutation 
correction was possible in sorted Tyeg-like cells from the affected patients. 
CD3*CD4*CD127"CD45RO*TIGIT* Treg cells, a population highly 
enriched for FOXP3* cells (Extended Data Fig. 6e), identified without the 
traditional T,eg cell IL-2Ra surface marker (absent due to the causative 
mutations), were FACS-sorted and underwent correction of the c.530A>G 
mutation using a Cas9 nuclease and short ssDNA HDR template (ssODN). 
After 12 days in culture, during which time the cells expanded more than 
100-fold, greater than 20% (compound het 1) and 40% (compound het 2) 
of targeted cells expressed IL-2Ra on their surface, demonstrating 
functional correction and expansion of a therapeutically relevant cell type. 
In these experiments, expansion was less robust for cells from compound 
het 3. f, After 12 days in culture, corrected Treg Cells from compound 
heterozygote 2, and a female healthy control, were sorted based on 

IL-2Ra and CD62L expression. Methylation of the TSDR (Treg-cell-specific 
demethylated region) of FOXP3 intron 1 was analysed in the indicated 
sorted cell populations by bisulfite sequencing (Epigendx). Owing to 
X-chromosome inactivation, incomplete demethylation is observed in the 
control Tyeg cell populations from the female healthy donor. The sorted 
IL-2Rahsh CD62Lh8" population of corrected Treg cell showed increasing 
TSDR demethylation, whereas similarly edited and expanded CD4* 

T effector (Terr) cells did not show substantial TSDR demethylation in 

the healthy donor or in corrected cells from compound heterozygote 2. 

All electroporations were performed according to optimized non-viral 
genome targeting protocol (Methods). For ssODN electroporations, 

100 pmol in 1 pl water was electroporated. 
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Extended Data Fig. 8 | HDR-mediated and non-HDR-mediated 
correction of IL2RA c.800delA frameshift loss-of-function mutation. 

a, Histograms of IL-2Ra surface expression in CD3* T cells in all children 
from a family carrying two loss-of-function IL2RA mutations, including 
three compound heterozygotes that express minimal amounts of IL-2Ra 
on the surface of the T cells (no electroporation, grey). Two days after 
electroporation of an RNP containing a gRNA for the site of one of the two 
mutations, a 1-bp deletion in the final exon of IL2RA (c.800delA) causing 
a run-on past the normal stop codon, CD3* T cells from a healthy donor 
and single heterozygotes (c.800 het 2 and 3) showed slight increases in 
IL-2Ra™ cells (RNP only, blue). This modest change is potentially due to 
the gRNA targeting the C terminus of the protein, in which small indels 
may cause less pronounced loss of surface protein expression. Notably, 

the RNP alone resulted in IL-2Ra surface expression in almost 50% of 
edited T cells in all three compound heterozygotes. In cells from two of the 
compound heterozygous children, increases in the percentage of cells with 
IL-2Ra correction compared to RNP only could be achieved by inclusion 
of an ssODN HDR template sequence with the mutation correction 

(RNP plus ssODN, green), and further increased at this site when using a 
longer dsDNA HDR template to correct the mutation (RNP plus dsDNA 
HDRT, yellow) (Extended Data Fig. 6i). b, Amplicon sequencing was 
performed in select targeted patient cells. c, pSTAT5 in response to high 
dose IL-2 stimulation (200 U ml~') in targeted CD3* T cells after 7 days 
of expansion post-electroporation. Increased numbers of pSTAT5* cells 
correlated with increased IL-2Ra surface expression (a). d, After 9 days of 
expansion post-electroporation, intracellular FOXP3 staining revealed an 
increased proportion of IL-2Rat FOXP3* cells in CD3* T cells compared 
to no electroporation controls. Electroporations were performed 


according to optimized non-viral genome targeting protocol (Methods). 
For ssODN electroporations, 100 pmol in 1 1l water was electroporated. 
e, Flow cytometric analysis of GFP expression 6 days after electroporation 
of a positive HDR control RAB11A—GFP dsDNA HDR template into 
CD3* T cells from the indicated patients revealed lower GFP expression 
in the three compound heterozygotes compared to their two c.800 
heterozygote siblings. Compared to a cohort of 12 similarly edited 
healthy donors (Fig. 1d), both c.800 heterozygotes as well as compound 
heterozygotes 1 and 2 were within the general range observed across 
healthy donors, whereas compound heterozygote 3 had lower GFP 
expression than any healthy donor analysed. Of note, in compound 
heterozygote 3, HDR-mediated correction at the c.530 mutation was 
substantially lower than the other two compound heterozygotes (Fig. 3b). 
IL-2Ra surface expression after electroporation of the c.800delA targeting 
RNP alone was similar though. Compared to HDR-mediated repair, 
NHE)J-mediated frameshift correction at c.800delA may be less dependent 
on cell proliferation, consistent with compound heterozygote 3 being the 
only compound heterozygous patient on active immunosuppressants at 
the time of blood draw and T cell isolation (Supplementary Note 3). 

f, Altered cell-state associated with the patient's disease could also 
contribute to diminished HDR rates. TIGIT and CTLA4 expression 

levels in non-edited, isolated CD4* T cells from each indicated patient 
was measured by flow cytometry. Consistent with altered cell states and 
or/ cell populations, cells from compound heterozygote 3 had a distinct 
phenotype, with increased TIGIT and CTLA4 expression compared both 
to healthy donors, the single heterozygous family members, as well the 
other two compound heterozygous siblings. 
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Extended Data Fig. 9 | Endogenous TCR replacement strategy and 
functional characterization. a—d, Schematic description of HDR template 
for endogenous TCR replacement by in-frame integration of a new TCR-B 
chain and a new variable region of a TCR-a chain at the TCR-a locus, and 
subsequent transcription and translation of the new TCR. e, HDR template 
for endogenous TCR replacement at the TCR-6 locus. f, Multiplexed 
integration of a new TCR-« at the TCR-a locus and a new TCR-( at the 
TCR-6 locus. See Supplementary Note 4 for detailed description of TCR 
replacement strategy. g, TCR mispair analysis after retroviral delivery or 
non-viral TCR replacement of an NY-ESO-1-specific TCR in gated CD4+ 
or CD8* T cells. With viral introduction of the new TCR, an infected cell 
will potentially express at least four different TCRs (new TCR-a plus new 
TCR-6; new TCR-a plus endogenous TCR-8; endogenous TCR-a and 

new TCR-; endogenous TCR-« plus endogenous TCR-8). Staining for 
the specific beta chain in the new introduced TCR (VB13.1) along with 
MHC-peptide multimer (NY-ESO) can provide a rough estimate of TCR 
mispairing by distinguishing between cells that predominantly expressed 
the introduced TCR (VB13.1+ NY-ESOt; new TCR-a and new TCR-8) 
versus those that expressed predominantly one of the potential mispaired 
TCRs (VB13.1+ NY-ESO-; endogenous TCR-« and new TCR-8). h, i, TCR 
replacement by targeting an entire new TCR into TRAC (a-d, also possible 
with a multiplexed knockout of TCRB), an entire new TCR into TRBC1/2 
(f), or multiplexed replacement with a new TCR-a into TRAC and a new 
TCR-6 into TRBC1/2. j, Functional cytokine production was observed 
selectively after antigen exposure in gated CD4* T cells, similarly to gated 
CD8* T cells (Fig. 4c). k, Non-viral TCR replacement was consistently 
observed at four days after electroporation in both gated CD8* and CD4*+ 
T cells across a cohort of six healthy blood donors. |, In a second cohort of 
six additional healthy blood donors, 100 million T cells from each donor 
were electroporated with the NY-ESO-1 TCR replacement HDR template 
and on-target gRNA/Cas9 (Fig. 4f). The percentage of CD4* and CD8* T 
cells that were NY-ESO-1 TCR* was consistent over 10 days of expansion 
after electroporation. m, Over 10 days of expansion after non-viral genome 
targeting, CD8* T cells showed a slight proliferative advantage over CD4* 
T cells. n, The indicated melanoma cell lines were co-incubated with the 


indicated sorted T cell populations at a ratio of 1:5 T cells to cancer cells. 
At 72 h after co-incubation, the percentage cancer cell confluency was 
recorded with by automated microscopy (in which nuclear RFP marks the 
cancer cells). T cells expressing the NY-ESO-1 antigen-specific TCR, either 
by retroviral transduction (black) or by non-viral knock-in endogenous 
TCR replacement (red) both showed robust target cell killing only in the 
target cancer cell lines expressing both NY-ESO-1 and the HLA-A*0201 
class I MHC allele. o, To ensure that target cell killing by non-viral TCR 
replacement T cells (red) was not due to either the gRNA or the HDR 
template used for TCR replacement alone, a matrix of on/off target gRNAs 
and on/off target HDR templates was assayed for target cell killing of the 
NY-ESO-1* HLA-A*0201* A375 cancer cell line (off-target gRNA and 
HDRT were specific for RAB11A-GFP fusion protein knock-in). Only 
cells with both the on-target gRNA as well as the on-target HDR template 
demonstrated target cell killing. p, Sorted NY-ESO-1* TCR* cells from a 
bulk T cell edited population (on-target gRNA, on-target HDR template) 
showed a strong dose-response effect for target cancer cell killing. Within 
48 h, T cell to cancer cell ratios of 2:1 and greater showed almost complete 
killing of the target cancer cells. By 144 h, T cell to cancer cell ratios of less 
than 1:16 showed evidence of robust target cell killing. q, Target cell killing 
by non-viral TCR replacement T cells was due specifically to the NY-ESO- 
1-recognizing TCR* cell population observed by flow cytometry after 
non-viral TCR replacement (Fig. 4b). Starting with the bulk edited T cell 
population (all of which had been electroporated with the on-target gRNA 
and HDR template), we separately sorted three populations of cells: the 
NY-ESO-1* TCR* cells (non-virally replaced TCR) (red), the NY-ESO-17 
TCR° cells (TCR-knockout) (grey), and the NY-ESO-1~ TCR* cells (those 
that probably retained their native TCR but did not have the NY-ESO- 
1-specific knock-in TCR) (orange). Only the sorted NY-ESO-1* TCR* 
population demonstrated target cell killing (4:1 T cell to cancer 

cell ratio). One representative donor from n= 2 (g, j) or n=3 (h, i) 
independent healthy donors with mean and s.d. of technical triplicates 

(j). Mean and s.d. of n = 6 independent healthy donors (1, m) or of four 
technical replicates for n = 2 independent healthy donors (o-q) are shown. 
Mean and individual values for n = 2 independent healthy donors (n). 
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Extended Data Fig. 10 | In vivo functionality of T cells with non-viral 
TCR replacement. a, Diagram of in vivo human antigen-specific tumour 
xenograft model. NSG mice (8-12 weeks old) were seeded with 1 x 10° 
A375 cells (human melanoma cell line; NY-ESO-1 antigen? and HLA- 
A*0201*) subcutaneously in a shaved flank. Primary human T cells edited 
to express an NY-ESO-1 antigen-specific TCR were generated (either 

by lentiviral transduction or non-viral TCR replacement), expanded 

for 10 days after transduction or electroporation, and frozen. Either a 
bulk-edited population was used (b, c) or an NY-ESO-1 TCR*-sorted 
population was used (d-f). At 7 days after tumour seeding, T cells were 
thawed and adoptively transferred via retro-orbital injection. b, Two days 
after transfer of 5 x 10° bulk non-virally targeted T cells (~10% TCR* 
NY-ESO-1* (red), ~10% TCR* NY-ESO-17 (orange), and ~80% TCR™ 
NY-ESO-17 (green), see Fig. 4b), NY-ESO-1* non-virally edited T cells 
preferentially accumulated in the tumour versus the spleen. n =5 mice 
for each of four human T cell donors. c, Ten days after transfer of 5 x 10° 
bulk non-virally targeted CFSE-labelled T cells, NY-ESO-1+ TCR®* cells 
showed greater proliferation than TCR” or TCR*NY-ESO-17 T cells, and 
showed greater proliferation (CFSE low) in the tumour than in the spleen. 
Ten days after transfer, TCR” and TCR* NY-ESO-17 T cells were difficult 


to find in the tumour (Fig. 4g). d, Individual longitudinal tumour volume 
tracks for data summarized in Fig. 4h. Sorted NY-ESO-1 TCR* T cells 

(3 x 10°) generated either by lentiviral transduction (black) or non-viral 
TCR replacement (red) were transferred on day 7 after tumour seeding 
and compared to vehicle-only injections until 24 days after tumour 
seeding. Note that the same data for vehicle control data are shown for 
each donor in comparison to lentiviral delivery (above) and non-viral 
TCR replacement (below). e, f, In these experiments, 17 days after T cell 
transfer (d), non-virally TCR-replaced cells appeared to show greater 
NY-ESO-1 TCR expression and lower expression of exhaustion markers. 
Transfer of both lentivirally transduced and non-viral TCR replaced cells 
showed reductions in tumour burden on day 24. In this experimental 
model, non-viral TCR replacement showed further reductions compared 
to the lentiviral transduction (Fig. 4h), potentially due to knockout of the 
endogenous TCR, endogenous regulation of expression of the new TCR, 
some difference in the cell populations amenable to non-viral versus 
lentiviral editing, or confounding variables in cell handling between 
lentiviral transduction and non-viral genome targeting. n = 4 (b), n=2 
(d-f), or n= 1 (c) independent healthy donors in 5 (b, c) or 7 (d-f) mice 
per donor with mean (b, e, f) and s.d. (b). 
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Mechanism of parkin activation by PINK1 


Christina Gladkova!, Sarah L. Maslen!, J. Mark Skehel! & David Komander!* 


Mutations in the E3 ubiquitin ligase parkin (PARK2, also known as 
PRKN) and the protein kinase PINK1 (also known as PARK6) are 
linked to autosomal-recessive juvenile parkinsonism (AR-JP))”; at 
the cellular level, these mutations cause defects in mitophagy, the 
process that organizes the destruction of damaged mitochondria*“. 
Parkin is autoinhibited, and requires activation by PINK1, which 
phosphorylates Ser65 in ubiquitin and in the parkin ubiquitin-like 
(Ubl) domain. Parkin binds phospho-ubiquitin, which enables 
efficient parkin phosphorylation; however, the enzyme remains 
autoinhibited with an inaccessible active site™®. It is unclear 
how phosphorylation of parkin activates the molecule. Here we 
follow the activation of full-length human parkin by hydrogen- 
deuterium exchange mass spectrometry, and reveal large-scale 
domain rearrangement in the activation process, during which the 
phospho-UbI rebinds to the parkin core and releases the catalytic 
RING2 domain. A 1.8 A crystal structure of phosphorylated human 
parkin reveals the binding site of the phospho-Ubl on the unique 
parkin domain (UPD), involving a phosphate-binding pocket 
lined by AR-JP mutations. Notably, a conserved linker region 
between Ubl and the UPD acts as an activating element (ACT) that 
contributes to RING2 release by mimicking RING2 interactions on 
the UPD, explaining further AR-JP mutations. Our data show how 
autoinhibition in parkin is resolved, and suggest a mechanism for 
how parkin ubiquitinates its substrates via an untethered RING2 
domain. These findings open new avenues for the design of parkin 
activators for clinical use. 

Work in the past decade has shown how PINK] and parkin initiate 
mitophagy, and many steps in this process are mechanistically well 
understood™. It has further been suggested that targeted activation 
of either PINK1 or parkin could increase mitochondrial turnover 
and impede the progression of Parkinson's disease. A detailed under- 
standing of the underlying molecular mechanisms of these processes 
is therefore essential. 

Parkin requires an elaborate activation mechanism. The first crystal 
structures of parkin’~ revealed several distinct mechanisms of auto- 
inhibition (Fig. la, Extended Data Fig. 1a). Most strikingly, the active 
site Cys on the catalytic RING2 domain, which receives ubiquitin from 
the E2 enzyme, is obstructed by an interface with the UPD (also known 
as RINGO) (Extended Data Fig. la). The RING2-UPD interface is 
highly hydrophobic’~? (Extended Data Fig. 1b), and it is not clear how 
this intramolecular interaction can be opened. 

Activation of parkin is mediated by the mitochondrial outer 
membrane (MOM) Ser/Thr protein kinase PINK1, which phospho- 
rylates Ser65 in ubiquitin (generating phospho-ubiquitin)!”"!* and in 
the parkin Ubl domain'>-!”. A current model for PINK1-mediated 
activation of parkin suggests that PINK1 phosphorylates ubiquitin 
attached to MOM proteins, and autoinhibited, cytosolic parkin is 
recruited with nanomolar affinity to sites of PINK1 activity?>!9187!. 
Binding of phospho-ubiquitin induces conformational changes in par- 
kin that lead to the release of the Ubl domain from the parkin core, 
and enable PINK] to phosphorylate the parkin Ubl domain??? 
(Fig. la, Extended Data Fig. 1c). Notably, in structures of parkin bound 
to phospho-ubiquitin®®, parkin is still autoinhibited; the E2 binding 
site remains blocked by the repressor (REP) element, and RING2 and 


its catalytic Cys remain obstructed by the UPD (Fig. la, Extended 
Data Fig. 1c). 

Indeed, full activation of parkin requires phosphorylation of its Ubl. A 
parkin S65A mutant is not retained at mitochondria, is unable to trigger 
mitochondrial ubiquitination and mitophagy, and thus is physio- 
logically inactive!?!>!7?1-3_ Biochemically, parkin phosphorylation 
enhances activity to a greater extent than binding of phospho- 
ubiquitin’®?!4, and parkin phosphorylation, but not phospho- 
ubiquitin binding, enables ubiquitin activity-based probes (Ub-ABPs) 
to access the active site Cys®!?°. How Ubl phosphorylation is able to 
activate parkin, and in particular, how it can disrupt the RING2-UPD 
interface, has remained unknown, and this has led to various models 
of parkin activation®>**?6, 

We reconstituted activation of full-length human parkin by PINK1, 
and followed domain rearrangements by hydrogen-deuterium 
exchange mass spectrometry (HDX-MS)”’ (Fig. lb-e, Extended 
Data Figs. 2, 3). HDX-MS reports on the relative rate of exchange of 
backbone amide hydrogens with deuterium, based on the strength 
of hydrogen bonding and solvent accessibility in the folded protein, 
and distinguishes peptides in a protein’s core (which show no or little 
exchange with solvent over time) from those at an exposed surface 
(which show high or increasing exchange with solvent over time). The 
power of the method lies in its ability to compare identical peptides 
between different states along an activation cascade, revealing pep- 
tides that become exposed and thus interfaces that are opened (red 
in Fig. 1b-e), and regions in the protein that become protected and 
form new interfaces (blue in Fig. 1b-e). For parkin, this allowed us to 
confirm previously reported conformational changes upon phospho- 
ubiquitin binding, whereby the parkin Ubl is released and becomes 
exposed to solvent (numbers 1 and 2 in Fig. 1b), the phospho-ubiquitin 
binding site becomes protected (3), and RING2, REP (4) and UPD are 
essentially unperturbed (Fig. 1b, Extended Data Fig. 3a). 

Phosphorylation of parkin initiates release of REP and RING2 (4, 5), 
especially at later time points, but the phosphorylated Ubl also remains 
flexible and in exchange with solvent (1) (Fig. 1c, Extended Data 
Fig. 3b). The behaviour of phospho-Ubl changes markedly when a 
covalent, non-dischargeable E2—ubiquitin conjugate is added to the 
sample—now, the C-terminal RING2 peptide at the UPD interface is 
exposed to solvent (5), and the phosphorylated Ubl becomes protected 
(1), indicating the formation of a new interface (Fig. 1d, Extended Data 
Figs. 2, 3c). Finally, charging of the catalytic Cys of RING2 by ubiquitin 
was assessed using phosphorylated parkin covalently modified with the 
Ub-ABP ubiquitin-vinylsulfone (Ub-VS)*!3 (see Methods, Extended 
Data Fig. 2a—c). ‘Charged’ phospho-parkin reiterates the conforma- 
tional changes observed in the phospho-parkin E2-Ub-bound sample 
(Fig. le, Extended Data Fig. 3d), showing that the ubiquitin-modified 
RINGZ2 had been fully released from the parkin core (5). Overall, the 
HDX-MS experiments indicated that there were considerable re- 
arrangements of Ubl and RING2, with loss of old and formation of 
new intramolecular interfaces on the parkin core (Fig. 1, Extended 
Data Figs. 2, 3). 

Unexpectedly, a section of the linker between Ubl and UPD was 
protected during rebinding of the phospho-UbI (6) (Fig. 1d, e, Extended 
Data Figs. 2, 3). This region of parkin, spanning amino acids 75-145, 
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Fig. 1 | Domain rearrangements in parkin, resolved by HDX-MS. 

a, Cartoon of parkin activation. Left, parkin is autoinhibited by several 
mechanisms (red circles)’. Middle, binding of phospho-ubiquitin (pUb) 
to parkin releases the Ubl domain, but most mechanisms of autoinhibition 
remain®®, Right, after Ubl phosphorylation, parkin is fully active (green 
circles), but a structure of active parkin has not been reported. Also see 
Extended Data Fig. 1. b-e, HDX-MS difference maps with the shortest 
peptides covering any given region, coloured from blue (more protected 
from exchange compared to previous state) to red (more accessible to 
solvent exchange). Peptides for grey regions could not be analysed 

(see Extended Data Fig. 3). The five columns per sample indicate different 
time lengths for hydrogen-deuterium exchange (0.3 s, 3 s, 30 s, 300 s and 
3,000 s). All experiments were performed with human full-length parkin, 
as technical triplicates. See Extended Data Figs. 2, 3 for raw data and 
structural mapping, respectively. b, Difference between parkin and parkin 
bound to phospho-ubiquitin. c, Difference between parkin-phospho- 
ubiquitin and phospho-parkin-phospho-ubiquitin. d, Difference between 
phospho-parkin-phospho-ubiquitin, and phospho-parkin-phospho- 
ubiquitin bound to a non-dischargeable UBE2L3-ubiquitin (Ub) complex 
(see Methods). e, Difference between phospho-parkin-phospho- 
ubiquitin and phospho-parkin-phospho-ubiquitin charged with Ub-VS 
(see Methods). 


has remained unstudied as it is disordered in full-length parkin’ 
and was removed in subsequent structures of human and rat 
parkin®!®!9, 

The UblI-UPD linker contains two connected, short sections of 
highly conserved residues that are flanked by a variable number of 
unconserved residues (Extended Data Fig. 4). A minimal linker is pres- 
ent in Thamnophis sirtalis (Ts) parkin (garter snake parkin, sequence 
identity to human parkin 73%, Extended Data Fig. 4), and Tsparkin 
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Fig. 2 | Structure of the phosphorylated parkin core. a, Schematic for 
obtaining a crystallizable phosphorylated parkin core. Scissors indicate the 
introduction of a TEV protease cleavage site after the IBR domain (amino 
acid 382). b, Crystal structure at 1.80 A of the human phosphorylated 
parkin core lacking RING2, bound to phospho-ubiquitin. Phosphorylated 
residues are shown in ball-and-stick representation. A cartoon 
representation similar to a is shown to the right. Also see Extended Data 
Fig. 6 and Extended Data Table 1. 


was used for comparative studies. HDX-MS revealed highly similar 
changes upon ubiquitin charging in phosphorylated Tsparkin when 
compared to human parkin (Extended Data Fig. 5a, with Fig. le). 
Moreover, limited proteolysis of full-length Tsparkin revealed that auto- 
inhibited, unphosphorylated Tsparkin was cleaved first in the UbI-UPD 
linker, whereas phosphorylated Tsparkin was cleaved first in the IBR- 
RING2 linker, and was not efficiently cleaved in the UbI-UPD linker 
(Extended Data Fig. 5b). After cleavage of phospho-Tsparkin, RING2 
was no longer stably associated with the parkin core (Extended Data 
Fig. 5c). Together, these data again strongly suggest that the unstudied 
UblI-UPD linker becomes ordered in activated parkin, whereas REP 
and RING2 are dislodged, and RING2 becomes mobile. 

We realized that crystallographic analysis of active parkin was 
likely to be impeded by a mobile RING2 domain, and this inspired 
new construct design. Parkin is insoluble when expressed without 
the RING2 domain (data not shown), probably owing to the exposed, 
hydrophobic UPD (Extended Data Fig. 1b). Hence, we engineered 
a tobacco etch virus (TEV) cleavage site into the I]BR-RING2 linker 
(Fig. 2a, see Methods). This enabled us to remove the RING2 domain 
upon phospho-ubiquitin binding and Ubl phosphorylation (Extended 
Data Fig. 5d). Notably, Ub-VS-charged Tsparkin and Tsparkin lacking 
RING2 (TsparkinARING2) displayed identical difference HDX-MS 
profiles, indicating that removal of the mobile RING2 had no effect on 
the remaining molecule (Extended Data Fig. 5e). For human parkin, 
the resulting covalent phospho-parkin ARING2-phospho-ubiquitin 
(hereafter phospho-parkin-phospho-ubiquitin) complex was crystal- 
lized, and resulted in a 1.8 A structure (Fig. 2b, Extended Data Table 1, 
Extended Data Fig. 6). 

The structure of phospho-parkin-phospho-ubiquitin (Fig. 2b) 
revealed a near-identical organization of the parkin core (UPD- 
RING1-IBR) bound to phospho-ubiquitin, as compared to previous 
structures (1.m.s.d. 0.73 A with human parkin-phospho-ubiquitin, 
PDB 5N2W°) (Extended Data Fig. 7a), and there were no large confor- 
mational changes in individual domains. Modelling of an open E2-Ub 
conjugate structure”® reveals sensible interfaces (Extended Data Fig. 7b) 
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Fig. 3 | The parkin UPD-phospho-UbI interaction. a, Structural detail 
of the binding site between parkin phospho-Ubl (green) and UPD 

(blue). Key residues are shown, and phospho-Ser65 is highlighted. Grey 
spheres indicate Zn atoms, and hydrogen bonds are shown as dotted 
lines. b, Ub-VS probe reactivity of the RING2 catalytic Cys residue with 
parkin-phospho-ubiquitin, phospho-parkin (pparkin), or phospho- 
parkin(K211N). The experiment was done in duplicate with identical 
results; for gel source data, see Supplementary Fig. 1. c, HDX-MS analysis 
of phospho-parkin-phospho-ubiquitin in comparison to phospho- 
parkin(K211N)-phospho-ubiquitin. The C-terminal peptide profiles 

are compared (see Extended Data Fig. 7d for overall data). The RING2 

C terminus remains solvent-protected in the phospho-parkin(K211N) 
background. Technical triplicates are shown for all time points. 

d, Superposition of parkin-phospho-ubiquitin (PDB 5N2W°), and 
phospho-parkin-phospho-ubiquitin showing the relative positions of the 
RING2 (cyan surface) and phospho-UbI (green surface), respectively, on 
the UPD domain. 


Phospho-Ubl 


and corroborates the ubiquitin binding site observed in HDX-MS° 
(Fig. 1d, Extended Data Fig. 3c). 

Notably, the phosphorylated Ubl domain was bound to the UPD, 
and had moved by more than 50 A from its position in autoinhibited 
parkin (Figs. 2b, 3a, Extended Data Fig. 7a). The interface between 
phospho-Ubl and UPD is mediated by a common interaction site of 
ubiquitin-fold modifiers, the hydrophobic Ile44 patch of the Ubl, and 
engulfs the elongated UPD domain covering a surface of more than 
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800 A?; this interface can be recapitulated in HDX-MS data (Fig. 1d, e, 
Extended Data Fig. 7c). Furthermore, the interaction places phospho- 
rylated Ser65 into a positively charged pocket on the UPD (Fig. 3a). 
The phospho-acceptor pocket is lined by Lys161, Arg163 and Lys211, 
which contact the phosphate group and form four hydrogen bonds. 
We had previously noted this putative phosphate-acceptor binding 
site’, the importance of which is highlighted by two mutations found 
in patients with AR-JP (K211N and K161N)!” that also abrogate the 
function of parkin in mitophagy'*”’. Mechanistically, phosphorylated 
parkin with a K211N mutation blocking the phospho-acceptor pocket 
was no longer modified by Ub-VS° (Fig. 3b). HDX-MS confirmed 
that phospho-parkin-phospho-ubiquitin(K211N) showed little sign 
of RING2 release and had the strongest relative solvent protection in 
the C terminus, where RING2 binds the UPD (Fig. 3c, Extended Data 
Fig. 7d). This indicated that the catalytic Cys of the RING2 domain 
remained inaccessible if phospho-UbI was unable to interact with its 
UPD binding site, and explained how AR-JP-causing K211N or K161N 
mutations produce parkin variants that cannot be activated by Ser65 
phosphorylation. 

The position of the Ubl on the UPD overlaps only marginally with 
the position of RING2 in autoinhibited states of parkin, and while 
binding of both would lead to steric clashes (Fig. 3d, Extended Data 
Fig. 7a), the hydrophobic RING2 binding site (Extended Data Fig. 1b) 
would remain unusually exposed upon opening of the RING2-UPD 
interface. In our structure, clear electron density for a stretch of resi- 
dues was apparent at the RING2-binding site of the UPD (Extended 
Data Fig. 6c), and we could unambiguously assign this density to the 
sequence corresponding to the first conserved region of the Ubl-UPD 
linker (Fig. 4, Extended Data Figs. 4, 6 ). In particular, residues Leu102, 
Val105 and Leu107 occupy pockets previously bound by RING2 res- 
idues Met458, Trp462 and Phe463 (Fig. 4a, b). Hence the UblI-UPD 
linker shields the hydrophobic patch on the UPD that was opened by 
release of RING2. Indeed, similar to the K211N mutation, phospho- 
parkin with deletion of the first set of conserved linker residues 
(A101-109) was unable to be charged by Ub-VS (Fig. 4c). 

The linker provides additional contact points for the phospho-Ubl 
interface. Arg104 is located between two key hydrophobic residues, and 
contacts with its side chain the Ser65 loop in phospho-UbIl. Notably, 
parkin(R104W) is a mutation found in patients with AR-JP?”°, and 
we would predict that this mutation would disrupt or misalign the 
observed hydrophobic interactions. A phospho-parkin(R104A) mutant 
was charged less efficiently by Ub-VS (Fig. 4d), showed slower E2-Ub 
discharge activity (Extended Data Fig. 8a, b) and reduced in vitro poly- 
ubiquitination activity (Fig. 4e), whereas its thermal stability remained 
unperturbed (Extended Data Fig. 8c). 

Together, structural, biochemical and patient data confirm the cru- 
cial importance of the first conserved stretch of the Ubl-UPD linker 
for parkin activity, and define a new activating element, which we term 
ACT, in this understudied regulatory region of parkin, which also con- 
tains several phosphorylation sites (see further discussion in Extended 
Data Fig. 8d). 

Our work resolves the activation mechanism of parkin, finally visu- 
alizing large-scale domain rearrangements and showing that the parkin 
Ubl switches between an inhibitory position in the unphosphorylated 
molecule to an activating position in phosphorylated parkin. Our data 
are consistent with a model in which the phosphorylated Ubl and the 
ACT element in the UbI-UPD linker dislodge RING2 from its auto- 
inhibited position, enabling it to be charged by E2-Ub, and ubiquiti- 
nate substrates in its vicinity independently of the parkin core (Fig. 4f). 
Notably, our model does not require parkin dimerization®”®. 

Our structure of an activated parkin core will inform drug discovery 
efforts that have set out to identify parkin activators. With the realiza- 
tion that the RING2-UPD interface opens and exposes a hydropho- 
bic pocket, small molecules could be directed towards this interface. 
Such molecules may become particularly useful to restart mitophagy 
in patients with AR-JP who carry parkin variants that are not activated 
by PINK1-mediated Ubl phosphorylation. 
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Fig. 4 | An activating element (ACT) in parkin. a, Structural detail of 

the ordered ACT within the parkin-phospho-Ubl-UPD linker. Three 
hydrophobic ACT residues bind the hydrophobic UPD groove, and polar 
ACT residues contact phospho-UbL b, Superposition of the ACT with 
RING2 (PDB 5N2W%, semi-transparent) in the same orientation as in a. 
Hydrophobic ACT residues mimic RING2 interactions. c, Ub-VS charging 
assay of phospho-parkin, and phospho-parkin variants lacking the ACT 
(A101-109) or the second conserved hydrophobic linker sequence (A116- 
123). Experiments were performed in duplicate with identical results; 

for gel source data, see Supplementary Fig. 1. d, Ub-VS charging assay as 
in c for wild-type (WT) phospho-parkin or the R104A mutant. Patients 
with parkin(R104W) suffer from AR-JP*°. Experiments were performed in 
duplicate with identical results; for gel source data, see Supplementary Fig. 1. 
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METHODS 
Molecular biology. cDNA of Thamnophis sirtalis (Ts) parkin was obtained from 


GeneArt (Invitrogen) with codon-optimization for bacterial expression and cloned 
into a pOPIN-K vector*!, using the In-Fusion HD Cloning Kit (Takara Clonetech). 
Human (Hs)parkin and Pediculus humanus corporis (Ph)PINK1 constructs were 
also expressed from a pOPIN-K vector, while UBE2L3 was expressed from a 
pGEX6 vector. HsUBE1/PET21d was a gift from C. Wolberger (Addgene plasmid 
# 34965°2). 

Site-directed mutagenesis was carried out using the QuikChange protocol 

with Phusion polymerase. A TEV cleavage site was introduced into the parkin 
constructs using the NEB Q5 Site-Directed Mutagenesis Kit (NEB). In Tsparkin, 
residues 368-374 (KSPGATA) were replaced by the ENLYFQS TEV cleavage 
sequence, while in Hsparkin residues 382-378 (EASGTTT) were replaced by the 
TEV cleavage sequence to yield cleavable constructs. 
Protein purification. For parkin expression, Escherichia coli Rosetta2 pLacl cells 
(Novagen) were grown in 2x TY medium at 37°C. At ODgo0 = 0.6 the temperature 
was reduced to 18°C; expression was induced at OD¢o9 = 0.8-1.0 with 30 11M IPTG 
and the medium supplemented with 200 tM ZnCl. Cells were harvested after 
overnight growth at 18°C and frozen at —20°C. 

For parkin purification, cells were resuspended in lysis buffer (300 mM 
NaCl, 10% (w/v) glycerol, 25 mM Tris (pH 8.5), 14.3 mM 6-mercaptoethanol) 
supplemented with 2 mg/ml lysozyme, 0.2 mg/ml DNasel and 80 j1g/ml PMSF. 
The suspension was homogenized using an EmulsiFlex-C3 (Avestin) for two 
passes at ~15,000 p.s.i. and cleared by centrifugation at 46,000g for 35 min at 
4°C. The clarified lysate was applied to Amintra glutathione resin (Expedeon), 
resin washed with high salt buffer (25 mM Tris (pH 8.5), 500 mM NaCl, 10 mM 
DTT) and GST-fusion parkin cleaved from the resin overnight at 4°C with GST-3C 
protease. 

Samples were eluted and resin washed with no-salt buffer (25 mM Tris (pH 8.5), 
10 mM DTT). All following purification steps were carried out on an Akta Pure 
system (GE Healthcare). Pooled fractions were subjected to anion-exchange 
chromatography on a 6-ml Resource Q column (GE Healthcare) with a 0-25% linear 
gradient from buffer A (25 mM Tris (pH 8.5), 10 mM DTT, 50 mM NaCl) to buffer B 
(25 mM Tris (pH 8.5), 10 mM DTT, 1,000 mM NaCl) over 15 column volumes. 
For phosphorylated parkin, the resulting sample was phosphorylated using a 
1:100 molar ratio of GST-PhPINK1 in phosphorylation buffer (10 mM ATP, 
10 mM MgCh, 200 mM NaCl, 50 mM Tris (pH 8.5), 10 mM DTT). PINK1 was 
subsequently removed by incubation with Amintra glutathione resin (Expedeon) 
and phosphorylated parkin purified using anion exchange chromatography as 
above. Finally, samples were subjected to size-exclusion chromatography (HiLoad 
16/600 Superdex 75 pg, GE Healthcare) into buffer C (25 mM Tris (pH 8.5), 
10 mM DTT, 200 mM NaCl). 

In short, HsUBE1 was purified as follows. An N-terminal GST-Ub fusion pro- 
tein was expressed and lysed in 3-mercaptoethanol-free lysis buffer and applied to 
Amintra glutathione resin (Expedeon). Upon washing, the resin was equilibrated 
with 50 mM Tris (pH 8.5) and 2 mM ATP. HsUBE1 B-mercaptoethanol-free clari- 
fied lysate was generated as above, supplemented with 10 mM ATP and 10 MgCl 
and incubated with the GST-Ub fusion-bound glutathione resin at room temper- 
ature for 30 min. The resin was then washed with DTT-free high salt buffer sup- 
plemented with 5 mM MgCl). HsUBE1 was eluted in DTT-containing buffer and 
protein-containing fractions were applied to anion-exchange and size-exclusion 
chromatography as above. UBE2L3, UBE2D3 and GST-PhPINK1 were purified 
as described previously*’. 

Generation of non-dischargeable E2-Ub complex. UBE2L3 (C86K) and ubiq- 
uitin were stored in charging buffer (25mM CAPSO (pH 9.5), 20 mM MgCh, 
150 mM NaCl). UBE2L3 (C86K) (450 1M) was incubated with Ub (900 1M) and 
HsUBE! (2.5 1M) in charging buffer supplemented with 10 mM ATP at 37°C 
overnight. The resulting mixture was applied to size-exclusion chromatography 
as above in buffer C. Fractions containing UBE2L3-Ub were pooled, concentrated 
and again applied to size-exclusion chromatography to remove free UBE2L3. 

Ub-VS generation and parkin coupling. Ub(1-75)-MesNa was prepared as 
described previously*4. H-Gly-VS hydrochloride was a kind gift from H. Ovaa 
and B.-T. Xin (Leiden University). Ub-MesNa, stored in buffer D (20 mM HEPES, 
50 mM sodium acetate (pH 6.5), 75 mM NaCl) at ~20 mg/ml, was used to dissolve 
~50 mg H-Gly-VS hydrochloride together with ~30 mg of N-hydroxysuccinamide 
(Fluka), acting as a catalyst. The pH was raised to 8.5 by addition of ~60 jl of 4M 
NaOH and reaction incubated at 37 °C. Reaction progress was monitored by LC-MS 
analysis. When the ratio of Ub(1-75)-VS to hydrolysed Ub-MesNa product 
was ~1:1, with a minimum formation of the doubly coupled, Ub(1-75)-VS-VS 
species, the reaction was quenched by addition of 20 jl 12 M HCl (~30 min). The 
subsequent sample was diluted in 50 mM sodium acetate (pH 4.5) and applied 
to cation-exchange chromatography on a 1-ml MonoS column (GE Healthcare) 
with a 10-35% linear gradient between 50 mM sodium acetate (pH 4.5) containing 
0 Mand 1 M NaCl, respectively. The resulting fractions were analysed by LS-MS 
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and Ub(1-75)-VS containing fractions were pooled and applied to size-exclusion 
chromatography as above in buffer D. 

For quantitative parkin-Ub-VS coupling, phospho-parkin was purified as above 
where 10 mM DTT in buffer C was replaced with 5 mM TCEP. Phospho-parkin 
and Ub-VS were mixed at a 1:3 molar ratio and incubated at room temperature. 
Reaction progress was monitored by LC-MS analysis and, upon completion, the 
reaction was quenched by addition of DTT (~30 min). The resulting sample was 
purified using size-exclusion chromatography (buffer C). 

Mass-spectrometry analysis. LC-MS analysis was carried out on an Agilent 1200 
Series chromatography system coupled to an Agilent 6130 Quadrupole mass spec- 
trometer. Samples were eluted from a Phenomenex Jupiter column (5 ml, 300 A, C4 
column, 150 x 2.0 mm) using an acetonitrile gradient + 0.2% (v/v) formic acid. 
Protein was ionized using an ESI source (3 kV ionization voltage), and spectra 
were analysed in positive ion mode with a mass range between 400 and 2,000 m/z. 
Averaged spectra were deconvoluted using Promass (Novatia, LLC) and plotted 
using GraphPad Prism (version 7). 

Limited proteolysis. Tsparkin, phospho- Tsparkin, phospho- Tsparkin-phospho- 
ubiquitin, and phospho- Tsparkin-phospho-ubiquitin charged with Ub-VS were 
purified as described above. A 1 mg/ml protein solution was mixed with 5 j1g/ml 
solution of elastase from the Proti-Ace Kit (Hampton research) and incubated for 
1 h at room temperature. The reactions were quenched by addition of DTT- and 
iodoacetamide-containing LDS buffer and resolved on a 4~12% SDS NuPAGE 
gradient gels (Invitrogen) and stained with Instant Blue SafeStain (Expedeon). 
Hydrogen-deuterium exchange mass spectrometry (HDX-MS). Complexes 
were formed on ice and incubated for 30 min to give a final parkin concentration 
of 10 11M. Deuterium-exchange reactions of parkin and the different complexes 
were initiated by diluting the protein in DO (99.8% (v/v) D230 ACROS, Sigma) in 
25 mM Tris (pH 8.5), 200 mM NaCl, 1 mM TCEP to give a final D2O percentage 
of ~95%. For all experiments, deuterium labelling was carried out at 23°C (unless 
otherwise stated) at five time points: 0.3 s (3 s on ice), 3 s, 30 s, 300 s and 3,000 s, in 
technical triplicate. The labelling reaction was quenched by the addition of chilled 
2.4% (v/v) formic acid in 2 M guanidinium hydrochloride and immediately frozen 
in liquid nitrogen. Samples were stored at —80°C before analysis. 

The quenched protein samples were rapidly thawed and subjected to proteolytic 
cleavage with pepsin followed by reversed phase HPLC separation. In brief, the 
protein was passed through an Enzymate BEH immobilized pepsin column, 
2.1 x 30 mm, 5 jum (Waters, UK) at 200 1l/min for 2 min, the peptic peptides were 
trapped and desalted on a 2.1 x 5 mm C18 trap column (Acquity BEH C18 Van- 
guard pre-column, 1.7 jum, Waters). Trapped peptides were subsequently eluted 
over 11 min using a 3-43% gradient of acetonitrile in 0.1% (v/v) formic acid at 
40 l/min. Peptides were separated on a reverse phase column (Acquity UPLC 
BEH C18 column 1.7 xm, 100 mm x 1 mm; Waters) and detected on a SYNAPT 
G2-Si HDMS mass spectrometer (Waters) over an m/z of 300 to 2,000, with the 
standard electrospray ionization (ESI) source with lock mass calibration using 
([Glu1]-fibrino peptide B (50 fmol/1l). The mass spectrometer was operated at a 
source temperature of 80°C and a spray voltage of 2.6 kV. Spectra were collected 
in positive ion mode. 

Peptide identification was performed by MS**° using an identical gradient of 
increasing acetonitrile in 0.1% (v/v) formic acid over 11 min. The resulting MS° 
data were analysed using Protein Lynx Global Server software (Waters, UK) with 
an MS tolerance of 5 ppm. 

Mass analysis of the peptide centroids was performed using DynamX software 

(Waters). Only peptides with a score >6.4 were considered. The first round of 
analysis and identification was performed automatically by the DynamX software, 
however, all peptides (deuterated and non-deuterated) were manually verified at 
every time point for the correct charge state, presence of overlapping peptides, 
and correct retention time. Deuterium incorporation was not corrected for back- 
exchange and represents relative, rather than absolute changes in deuterium levels. 
Changes in H/D amide exchange in any peptide may be due to a single amide or a 
number of amides within that peptide. 
Protein preparation for crystallization. TEV-cleavable parkin was purified as 
described above. An anion-exchange purified parkin sample (step 1) was incu- 
bated with Ub-C3Br in a 1:4 molar ratio and GST-PhPINK1 in a 9:1 molar ratio in 
phosphorylation buffer (10 mM ATP, 10 mM MgCh, 200 mM NaCl, 50 mM Tris 
(pH 8.5), 10 mM DTT) at a final parkin concentration of ~70 \.M, yielding a phos- 
phorylated, phospho-ubiquitin conjugated, TEV-cleavable parkin sample (step 2). 
GST-PhPINK1 was subsequently removed using Amintra glutathione resin 
(Expedeon). The sample was subjected to Hiss~TEV cleavage overnight at 4°C 
(step 3). Hiss-TEV was removed using Ni-NTA agarose (Qiagen), sample diluted 
in buffer A and applied to anion-exchange and size-exclusion chromatography as 
described above. 

To generate Ub-C3Br, Ub(1-75)—-MesNa was prepared as described previ- 
ously*4, Ub-MesNa, stored in buffer D was incubated with 0.2 g/ml 3-bromo- 
propylamine hydrobromide (Fluka) dissolved in PBS (pH 4.8) at 2:1 molar ratio 
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with final Ub(1-75)-MesNa concentration of 445 tM. The coupling was carried 
out on ice for 30 min following addition of 50 \1l 4 M NaOH to raise the pH to 
10.5. The reaction was quenched by addition of 12 1] of 12 M HCl and sample 
buffer exchanged using a disposable PD-10 desalting column (GE Healthcare) 
into buffer C. 

Crystallization. Initial crystals were found from crystallization experiments 
carried out at 18°C in a 96-well sitting drop vapour diffusion plates in the 
MRC format (Molecular Dimensions) by mixing 100 nl of 4 mg/ml protein 
solution with 100 nl reservoir solution. The crystallization condition of 12.5% 
(w/v) PEG 1000, 12.5% (w/v) PEG 3350, 12.5% (v/v) MPD, 0.03 M of each sodium 
nitrate, disodium hydrogen phosphate, ammonium sulfate, 0.1 M MOPS/ 
HEPES-Na (pH 7.5) was found from the MORPHEUS screen (Molecular 
Dimensions). Seeds were obtained from a fine screen and streak seeding was car- 
ried out in a hanging drop format from an 8 mg/ml protein solution. Larger crystals 
were obtained after 6 days in the original crystallization condition. Crystals were 
soaked in mother-liquor supplemented with 10% (v/v) glycerol before vitrification 
in liquid nitrogen. 

Data collection, phasing and refinement. Diffraction data were collected at the 
Diamond Light Source, beamline I-24 (0.9686 A, 100 K), and processed using 
DIALS**. The crystal structure was determined by molecular replacement in 
Phaser?”using the structure of the human parkin core (PDB 5N2W’°) truncated 
after the IBR, as well as a human parkin Ubl structure (PDB 5C1Z””). The structure 
was built at 1.80 A, in multiple rounds of model building in Coot*® and refine- 
ment in PHENIX®’. Phenix ReadySet-derived geometry restraints for the 3CN 
warhead were used, with external restraints defining the linkage points. Final 
Ramachandran statistics: 98.9% favoured, 1.1% allowed, and 0% outliers. Structural 
figures were generated using PyMol (http://www.pymol.org). Data collection and 
refinement statistics can be found in Extended Data Table 1. 

Parkin activity assays. Ub-VS conjugation assays. Indicated parkin variants stored 
in either DI T- or TCEP-containing buffer were incubated with Ub-VS that was 
prepared as described above. The reactions were quenched at indicated time points 
by addition of DTT- and iodoacetamide-containing LDS buffer and resolved on 
a 4~12% SDS NuPAGE gradient gels (Invitrogen) and stained with Instant Blue 
SafeStain (Expedeon). 

Parkin assembly assays. Wild-type or R104A phospho-parkin (5 1M) were incu- 
bated in ubiquitination buffer (30 mM HEPES (pH 7.5), 100 mM NaCl, 10 mM 
ATP, 10 mM MgCl) with HsUBE1 (0.2 1M), UBE2L3 (2 \1M) and Ub (20 1M). 
The reactions were quenched at the indicated time points by addition of DTT- and 
iodoacetamide-containing LDS buffer and resolved on a 4-12% SDS NuPAGE 
gradient gels (Invitrogen) and transferred to a PVDF membrane (BioRad). 
Membranes were blocked in a 5% (w/v) milk solution in PBS-T (PBS + 0.1% 
(v/v) Tween-20) for 30 min and incubated overnight at 4°C with a ubiquitin- 
recognizing antibody (Ubi-1, NB300-130, Novus Biologicals) in 5% (w/v) BSA in 
PBS-T and 0.1% (w/v) sodium azide. The membrane was then washed with PBS-T, 
incubated for 1 h at room temperature with anti-mouse IgG-HRP (NXA931, GE 
Healthcare) in 5% (w/v) milk in PBS-T, washed in PBS-T and visualized using the 
Amersham Western Blotting Detection Reagent (GE Healthcare) and a ChemiDoc 
Touch Imaging System (BioRad). 

E2-Ub discharge assays. The UBE2D3-Ub conjugate was generated by incubat- 
ing UBE2D3 (20 |1M) with HsUBE1 (20 nM) and Ub (80 ,.M) in ubiquitination 
buffer supplemented with 5 1M CaCl, at 37°C for 10 min. To remove remaining 


ATP, 0.5 U of Apyrase (NEB) was added and the reaction incubated at 30°C for 
30 min. 

The discharge reaction was studied by addition of 1 1M wild-type or R104A 
phospho-parkin to a diluted charging reaction mixture (final UBE2D3 concen- 
tration was 9 |1M). The reactions were quenched at indicated time points by 
addition of DT T-free LDS buffer, while a final sample was collected at 11 min in 
DTT-containing LDS buffer to assess the extent of isopeptide-linked UBE2D3- 
Ub species formation. Samples were resolved on 4-12% SDS NuPAGE gradient 
gels (Invitrogen) and stained with Instant Blue SafeStain (Expedeon). The gel 
band intensity was quantified in ImageJ by isolating the specific intensity of the 
UBE2D3~Ub thioester band as indicated, subtracting the background of the final 
reduced sample and normalized within each reaction. 

Thermal denaturation assays. Protein melting curves were recorded on a Corbett 
RG-6000 real time PCR cycler (30°C to 85°C with 7 s per 0.5°C). Samples con- 
tained 41M parkin protein and 4x SYPRO orange in ubiquitination buffer + 5 mM 
TCEP. Melting curves were obtained as the maxima of dF/dT versus T plots. All 
data were recorded in triplicate. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Coordinates and structure factors have been deposited with the 
Protein Data Bank under accession code 6GLC. Uncropped versions of all gels 
are displayed in Supplementary Fig. 1. All reagents and data are available upon 
reasonable request from the corresponding author. 
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Extended Data Fig. 1 | Mechanisms of parkin autoinhibition. coloured white on each surface. c, Structure of phospho-ubiquitin bound 
a, Structure of autoinhibited, full-length human parkin (PDB 5C1Z””) to full-length parkin (PDB 5N2W%) as in a. Phospho-ubiquitin binding 
shown schematically (top, as in Fig. 1a) and in cartoon representation leads to helix straightening, and IBR domain repositioning, which 

in the same colours. Two insets show the UPD-RING2 interface (with releases the Ubl domain for phosphorylation®®. In the shown structures of 
Cys431 shown in ball-and-stick representation), and the blocked E2 unphosphorylated parkin, the Ubl and REP (red) inhibit E2 binding, and 
binding site (with the E2 position, modelled according to PDB 5EDV’®, the RING2-UPD interface is intact, with Cys431 being inaccessible. The 
shown as grey surface). Zn ions are shown as grey spheres. b, An ‘open- UbI-UPD linker was removed from crystallized constructs in a and c®!”. 
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Extended Data Fig. 2 | Sample preparation for HDX-MS and selected 
raw data. a, Representative LC-MS spectrum of the prepared Ub-VS 
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probe (see Methods). Experiment was performed in duplicate. 


b, Representative LC-MS spectrum of Ub-VS-reacted phospho-parkin. 
Experiment was performed in duplicate. c, Samples used in HDX-MS 
analysis. In HDX-MS, non-covalent complexes with phospho-ubiquitin 


were used. Covalent complexes are indicated with a dash and non-covalent 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Graphical representation of HDX-MS data. 

Data from HDX-MS experiments (Fig. 1b-e) were plotted onto a stylized 
‘open domain model of parkin, with identical colouring (blue, more 
protected from solvent exchange compared to previous state; red, less 
protected from solvent exchange compared to previous state). Grey regions 
correspond to peptides that were not covered or could not be analysed 
owing to modification. Schematic domain representations indicate an 
average change of the corresponding interfaces across all time points. 
White regions indicate no change. a, Parkin compared to parkin-phospho- 
ubiquitin. b, Parkin-phospho-ubiquitin compared to phospho-parkin- 
phospho-ubiquitin. c, Phospho-parkin-phospho-ubiquitin compared 

to phospho-parkin-phospho-ubiquitin in complex with an isopeptide 
UBE2L3-Ub thioester mimetic (see Methods). This experiment confirmed 
a previously reported binding site for the E2-conjugated ubiquitin on the 


RBR®?S (8). d, Phospho-parkin—phospho-ubiquitin compared to Ub-VS- 
reacted phospho-parkin-phospho-ubiquitin. Reaction with Ub-VS leads 
to modification of the catalytic Cys431-containing-peptide, generating 
non-identical peptides precluding comparison by HDX-MS. Low 
coverage of the RING2 domain can be explained by ubiquitin resistance 
to pepsin cleavage, leading to protection of the linked RING2 domain 
and subsequent peptide loss. To allow comparison, these peptides were 
also omitted from analysis of the UBE2L3-Ub-bound sample. In c and 
d, the structure representation is deceiving because REP and RING2 are 
highly mobile and are no longer bound to the parkin core. Indeed, the 
high hydrogen-deuterium exchange in the REP sequence in active parkin 
(Fig. 1d, e; peptide (4) in Extended Data Fig. 2d) indicates an additional 
loss of secondary structure in this helical element when REP and RING2 
are released. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Tsparkin and pre-crystallization biochemistry 
for human parkin. a, HDX-MS experiment comparing phospho- Tsparkin 
reacted with phospho-ubiqutin-C3Br and phospho- Tsparkin reacted 
with phospho-ubiquitin-C3Br and Ub-VS with identical colouring (blue, 
more protected from solvent exchange; red, less protected from solvent 
exchange; grey, not covered in all of the compared states, see Fig. 1). The 
experiment was performed in technical triplicate. The Tsparkin profile 
is highly similar to the profile of human parkin in an analogous state 
(Fig. le). Higher peptide resolution in this sample reveals protection of 
the RING2 interface by reacted Ub-VS, but the C terminus of RING2 
that binds to the UPD interface is surface exposed. Both phospho-Ubl 
and the Ubl-UPD linker are protected in activated parkin. b, Limited 
proteolysis of Tsparkin with elastase, in different stages of activation. In 
unphosphorylated, autoinhibited Tsparkin, the Ubl is cleaved off in the 
UblI-UPD linker. In activated forms of Tsparkin (phospho-Tsparkin, 
phospho- Tsparkin reacted with phospho-ubiquitin-C3Br, phospho- 
Tsparkin reacted with phospho-ubiquitin-C3Br and Ub-VS), the RING2 
is readily cleaved off, while the Ubl is not efficiently removed. This 
suggests that the Ubl-UPD linker is not accessible in activated forms of 
Tsparkin. A representative gel from three independent experiments is 
shown. For gel source data, see Supplementary Fig. 1. c, A TEV cleavage 


LETTER 


site was introduced after the IBR domain, so that after activation by 
phospho-ubiquitin and Ubl-phosphorylation, the released RING2 
domain can be removed. Once removed, RING2 is no longer stably 
associated with the remaining parkin core. Shown is a gel filtration profile 
illustrating this point. A representative profile from three independent 
experiments is shown. d, SDS-PAGE analysis of sample preparation 
process (see Methods). Asterisk denotes ubiquitin probe (Ub-C3Br)- 
reacted material that modifies the RING2 catalytic Cys, which explains the 
cleaved, probe-reacted RING2 band (asterisk in step 3). A representative 
gel from three independent experiments is shown. For gel source data, see 
Supplementary Fig. 1. e, HDX-MS experiment on Tsparkin, comparing 
phospho-Tsparkin reacted with phospho-ubiquitin-C3Br with phospho- 
Tsparkin reacted with phospho-ubiquitin-C3Br and Ub-VS (bottom) 

or with RING2-TEV-cleaved phospho-Tsparkin reacted with phospho- 
ubiquitin-C3Br (top), coloured as in a. Identical profiles were obtained, 
showing that RING2 removal has no effect on the activated core of parkin. 
This further indicates that RING2 acts independently of the parkin 

core upon full activation. Notably, in both comparisons, we observed 
concomitant protection of phospho-Ubl and the UblI-UPD linker. The 
experiment was performed in technical triplicate. 
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Extended Data Fig. 6 | Quality control and electron density maps 

for human phospho-parkin-phospho-ubiquitin. a, LC-MS spectrum 
of crystallized human phospho-parkin (amino acids 1-382) bound 

to phospho-ubiquitin. This is representative of two independent 
experiments. b, Composite omit map (generated with simulated 
annealing) shown for the single complex in the asymmetric unit. 
2|Fo|—|F-| electron density is shown at lo. c, Electron density as in b for 


x pSer65 
(parkin)\ 


S 


the Ubl-UPD linker. d, Electron density as in b for the Ser65 phospho- 
Ubl binding site on the UPD linker. e, Electron density as in b for the 
Ser65 phospho-Ub binding site. As we are missing electron density for 
disordered regions in the UbI-ACT and ACT-UPD linkers, we cannot 
exclude the possibility that phospho-Ubl may interact in trans with a 
neighbouring parkin molecule. Also see Extended Data Table 1. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | The phospho-Ubl binding site on the UPD. 

a, Side-by-side view of phospho-parkin—phospho-ubiquitin (left) and 
parkin-phospho-ubiquitin (PDB 5N2W%, right), and superposition of 
both (below). The green Ubl domain changes position by >50 A. 

b, E2-Ub from the structure of the HOIP RBR domain in complex with 
UBE2D2-Ub”* was modelled onto phospho-parkin-phospho-ubiquitin, 
by superposition of the RING1 domains of each complex. The E2- 
conjugated ubiquitin molecule in the ‘open’ conformation binds to the 
previously recognized cryptic ubiquitin binding interface on RINGI- 
IBR®. The contact points correlate with HDX-MS data (Fig. 1d, Extended 


Data Figs. 2, 3c). c, HDX-MS data from Fig. le were plotted onto the 
phospho-parkin-phospho-ubiquitin structure with identical colouring 
(blue, more protected from solvent exchange; red, less protected from 
solvent exchange; grey, not covered in all of the compared states, compare 
with Fig. 1). Protected regions on UPD match the observed phospho-Ubl 
interface. d, HDX-MS experiments comparing parkin with a mutation in 
the phospho-acceptor binding site on the UPD (phospho-parkin(K211N)- 
phospho-ubiquitin) compared with phospho-parkin-phospho-ubiquitin, 
coloured as in c. The mutant is unable to protect the Ubl, and to release 
RING2 and REP. Experiments were done as technical triplicate. 
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Extended Data Fig. 8 | A regulatory role of the parkin UbI-UPD linker. 
a, b, E2 discharge assay resolved on a Coomassie stained SDS-PAGE 

gel (a) and quantified from band intensities (b) for phospho-parkin 

and phospho-parkin(R104A). This is representative of at least two 
independent experiments; for gel source data, see Supplementary Fig. 1. 
The mutation in the ACT element leads to a reduction in discharge 
activity, suggesting that the residue is required to dislodge RING2 from 
the parkin core. c, Parkin(R104A) is equally stable as wild-type parkin, 

in the unphosphorylated or phosphorylated form. Thermal denaturation 
experiments were performed as technical triplicate. d, Sequence detail 

of the Ubl-UPD linker, which contains the ACT element described 

here. In the ACT element as bound to phospho-parkin-phospho- 
ubiquitin, the positions of two annotated (in PhosphoSitePlus) parkin 
phosphorylation sites, Ser101 and Ser108, are resolved. Phosphorylation 
of Ser101 decreases parkin activity*°, which is probably explained by 
phosphorylation preventing phospho-Ubl and/or linker binding to the 
UPD. It is hence highly likely that phosphorylation of parkin on these 
residues provides additional layers of parkin regulation that remain to be 
uncovered in future work. As an example, parkin phosphorylation by PKA 
was recently reported to be a mechanism of parkin inhibition in beige- 


to-white adipocyte transition, although phosphorylation sites remained 
unclear*!. Residues before the ACT element (amino acids 73-99) and after 
the ACT element (amino acids 109-142) are disordered in our structure. 
The last ordered residue, Ser108, is tantalizingly close to the REP binding 
site as well as to the phospho-ubiquitin binding pocket, but disorder 
suggests that clear binding sites for other conserved linker residues, in 
particular for the parkin GLAVIL motif, are not present. HDX-MS also 
does not reveal additional protection of the linker, even when the E2-Ub 
conjugate is bound, suggesting that the GLAVIL motif may not bind the 
E2 (Fig. 1d, Extended Data Figs. 2, 3c). On the other hand, there are at 
least three additional annotated phosphorylation sites, Ser116, Ser131 
and Ser136'>*°?43, suggesting that the second part of the linker may 
also be regulated. Phosphorylation on these residues could change the 
ability of the disordered parts of the linker to interact with parkin in cis. 
For example, we would speculate that a phosphorylated Ser116 could 

for example, reach the phosphate binding pocket occupied by phospho- 
Ser65 of ubiquitin. Alternatively, the remaining UbI-UPD linker may 

be important for substrate recruitment, or involved in other, PINK1- 
independent mechanisms of parkin activation. 
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Extended Data Table 1 | Data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 

a, b, c (A) 

a, B, y (°) 
Resolution (A) 
Rmerge 
I/ol 
Completeness (%) 
Redundancy 


Refinement 

Resolution (A) 

No. reflections / test set 

Rwork / Ree 

No. atoms 
Protein 
Ligand/ion 
Water 

B-factors 
Protein 
Ligand/ion 
Water 

R.m.s. deviations 
Bond lengths (A) 
Bond angles (°) 


phospho-parkin (1-382) - phospho-Ub 


P3921 


83.93, 83.93, 105.12 
90, 90, 120 

72.69 —1.80 (1.84 — 1.80) 
0.065 (0.773) 

13.80 (2.40) 

100.00 (99.30) 

6.7 (6.7) 


59.79 —1.80 
40229 / 2020 
0.180 / 0.205 


3039 (398 aa) 


“Values in parentheses are for highest-resolution shell. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


https://doi.org/10.1038/s41586-018-0319-4 


Resistance-gene-directed discovery of a natural- 
product herbicide with a new mode of action 


Yan Yan ®, Qikun Liu*®, Xin Zang**+®, Shuguang Yuan°, Undramaa Bat-Erdene!, Calvin Nguyen’, Jianhua Gan’, Jiahai Zhou***, 


Steven E. Jacobsen* & Yi Tang)”* 


Bioactive natural products have evolved to inhibit specific cellular 
targets and have served as lead molecules for health and agricultural 
applications for the past century!3. The post-genomics era has 
brought a renaissance in the discovery of natural products using 
synthetic-biology tools*-®. However, compared to traditional 
bioactivity-guided approaches, genome mining of natural products 
with specific and potent biological activities remains challenging*. 
Here we present the discovery and validation of a potent herbicide 
that targets a critical metabolic enzyme that is required for plant 
survival. Our approach is based on the co-clustering of a self- 
resistance gene in the natural-product biosynthesis gene cluster”~°, 
which provides insight into the potential biological activity of the 
encoded compound. We targeted dihydroxy-acid dehydratase in the 
branched-chain amino acid biosynthetic pathway in plants; the last 
step in this pathway is often targeted for herbicide development?®. 
We show that the fungal sesquiterpenoid aspterric acid, which was 
discovered using the method described above, is a sub-micromolar 
inhibitor of dihydroxy-acid dehydratase that is effective as a 
herbicide in spray applications. The self-resistance gene astD was 
validated to be insensitive to aspterric acid and was deployed as a 
transgene in the establishment of plants that are resistant to aspterric 
acid. This herbicide-resistance gene combination complements the 
urgent ongoing efforts to overcome weed resistance!!. Our discovery 
demonstrates the potential of using a resistance-gene-directed 
approach in the discovery of bioactive natural products. 

Weeds are a major source of crop losses, and the evolution of 
herbicide resistance in weeds has led to an urgent need for new herbi- 
cides with novel modes of action!!“!4. The branched-chain amino acid 
(BCAA) biosynthetic pathway is essential for plant growth”, It is not 
present in animals and is therefore a validated target for highly specific 
weed-control agents’”. The BCAA biosynthetic pathway in plants is 
carried out by three enzymes: acetolactate synthase (ALS), acetohy- 
droxy acid isomeroreductase (KARI), and dihydroxyacid dehydratase 
(DHAD) (Fig. 1a). Given the success of targeting ALS for herbicide 
development", it is notable that no herbicide that targets either of the 
other two enzymes has been developed. DHAD is an essential 
and highly conserved enzyme among plant species that catalyses 
6-dehydration reactions to yield «-keto acid precursors to isoleucine, 
valine and leucine'*'* (Extended Data Fig. 1a, Supplementary Fig. 1). 
Efforts towards synthetic DHAD inhibitors resulted in compounds with 
submicromolar inhibition constants (K;), however, the compounds 
have no reported in planta activity’’ (Extended Data Fig. 1b). 

Filamentous fungi are prolific producers of natural products, many 
of which have biological activities that aid the fungi in colonizing 
and killing plants!*!®. Therefore, fungal natural products repre- 
sent a promising source of potential leads for herbicides. The abun- 
dance of sequenced fungal genomes enables genome mining of new 


natural products with novel biological activities*®. Although no natural- 
product inhibitors of DHAD are known to date, we reason that a fungal 
natural product with this property might exist, given the indispensable 
role of BCAA biosynthesis in plants’”, 

To identify natural-product biosynthetic gene clusters that may 
encode a DHAD inhibitor, we hypothesized that such a cluster must 
contain an additional copy of DHAD that is insensitive to the inhibitor, 
thereby providing the required self-resistance for the producing organ- 
ism to survive. Genes encoding a self-resistance enzyme are frequently 
found in microbial natural-product gene clusters, as highlighted by the 
presence of an insensitive copy of HMG-CoA reductase (HMGR) and 
inosine monophosphate dehydrogenase (IMPDH) in the gene clusters 
of lovastatin (that targets HMGR) and mycophenolic acid (that targets 
IMPDH), respectively'?”° (Extended Data Fig. 1c). This phenomenon 
has been used to predict molecular targets of natural products, as well 
as to identify gene clusters of natural products of known activities>””. 

To identify possible self-resistance enzymes, we scanned sequenced 
fungal genomes to search for co-localization of genes encoding DHAD 
with core biosynthetic enzymes, such as terpene cyclases and polyketide 
synthases among others*!”. We identified a well-conserved set of four 
genes across multiple fungal genomes (Fig. 1b), including the common 
soil fungus Aspergillus terreus that is best known for producing lovasta- 
tin. The conserved gene clusters include genes that encode a sesquiter- 
pene cyclase homologue (astA), two cytochrome P450 genes (astB and 
astC) and a homologue of DHAD (astD). Genes outside of this cluster 
are not conserved across the identified genomes and are hence unlikely 
to be involved in the biosynthesis of natural products. AstD is the sec- 
ond copy of DHAD encoded in the genome, and is approximately 70% 
similar to the housekeeping copy that is well-conserved across fungi 
(Supplementary Fig. 2). Therefore, AstD is potentially a self-resistance 
enzyme that confers resistance to the encoded natural product. As with 
a majority of biosynthetic gene clusters in sequenced fungal genomes, 
the ast cluster has not been associated with the production of a known 
natural product’. 

To identify the natural product encoded by the ast cluster, we 
heterologously expressed astA, astB and astC genes in the host 
Saccharomyces cerevisiae RCO1*3. New compounds that emerged were 
purified and their structures were elucidated using NMR spectroscopy 
(Supplementary Fig. 3, Supplementary Table 5). RCO1 cells expressing 
only astA produced a new sesquiterpene (1), which was confirmed to be 
(—)-daucane (Supplementary Fig. 4). RCO1 cells expressing both astA 
and astB led to the biosynthesis of a new product that was determined 
structurally to be the a-epoxy carboxylate (2) (Fig. 1c). When astA, 
astB and astC were expressed together, a new compound (3) became the 
dominant product (approximately 20 mg1~'). Full structural determi- 
nation revealed 3 to be the tricyclic aspterric acid, which is a previously 
isolated compound” (Fig. 1c). The biosynthetic pathway for aspterric 
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Fig. 1 | Genome mining of a DHAD inhibitor and biosynthesis of 
aspterric acid. a, Valine, leucine and isoleucine are produced by two 
parallel pathways using three enzymatic steps: ALS, KARI and DHAD. 
b, A 17-kb biosynthetic gene cluster (BGC) from A. terreus containing 
four open reading frames (ORFs), which are also conserved among several 
fungal species. astA has sequence homology to sesquiterpene cyclase; astB 
and astC are predicted to be P450 monooxygenases; astD is predicted to 
encode a DHAD and is proposed to confer self-resistance in the presence 
of the natural product produced in the cluster. c, High-performance liquid 
chromatography-mass spectrometry (HPLC-MS) traces of metabolites 


acid is therefore concise: after cyclization of farnesyl diphosphate by 
AstA to create the carbon skeleton in 1, AstB catalyses oxidation of 1 
to yield the epoxide 2. Further oxidation by AstC at carbon 15 yields an 
alcohol, which can undergo intramolecular epoxide opening to create 
aspterric acid (Fig. 1d). 

Upon its initial discovery, aspterric acid was shown to have inhibitory 
activity towards Arabidopsis thaliana, however, the mode of action was 
not known’’. Our resistance-gene-directed approach led to rediscovery 
of this compound with DHAD as a potential target. We first confirmed 
that aspterric acid is able to potently inhibit A. thaliana growth in an 
agar-based assay (Fig. 2a, Supplementary Fig. 5). Aspterric acid was 
also an effective inhibitor of root development and plant growth when 
applied to a representative monocot (Zea mays) and dicot (Solanum 
lycopersicum) (Fig. 2b). To test whether aspterric acid indeed targets 
DHAD, we expressed and purified housekeeping DHAD from both 
A. terreus (XP_001208445.1, AteDHAD) and A. thaliana (AT3G23940, 
AthDHAD), as well as the putative self-resistance enzyme AstD 
(Supplementary Fig. 6). Both housekeeping DHAD enzymes converted 
dihydroxyisovalerate to ketoisovalerate (AthDHAD: ka = 1.2 871, 
Kyu = 5.7 mM) as expected. The enzyme activities, however, were 
inhibited in the presence of aspterric acid (Extended Data Fig. 2). The 
half-maximal inhibitory concentration (ICs9) values of aspterric acid 
towards AteDHAD and AthDHAD were 0.31 \tM and 0.50 1M, respec- 
tively, at an enzyme concentration of 0.50 {1M (Extended Data Fig. 3). 
Aspterric acid was further determined to be a competitive inhibitor of 
AthDHAD with a Kj = 0.30 .M (Extended Data Fig. 3). Aspterric acid 
displayed no significant cytotoxicity towards human cell lines up to 
500 \.M concentration, consistent with the lack of DHAD in mamma- 
lian cells (Supplementary Fig. 7). 

AstD catalyses the identical 3-dehydration reaction as DHAD, albeit 
with a significantly slower turnover rate (keat = 0.03 s~!, Ky =5.4mM). 
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produced from S. cerevisiae RCO1 expressing the different ast genes under 
the P4px2 promoter control. Control, S. cerevisiae without expression 
plasmids; S. cerevisiae transformed with plasmids expressing astA and 
astB, which produces 2; and S. cerevisiae transformed with plasmids 
expressing astA-C, which produces aspterric acid at a titre of 20 mg 171. 
The experiments were repeated independently with similar results three 
times. EIC, extracted ion chromatogram. d, Proposed biosynthetic 
pathway of aspterric acid. AstA cyclizes farnesyl diphosphate (FPP) into 1 
and the P450 enzymes AstB and AstC then sequentially transform 1 into 2 
and aspterric acid (3, AA), respectively. MM, molecular mass. 


However, the enzyme was not inhibited by aspterric acid, even at the 
solubility limit of 8 mM (Extended Data Fig. 3). To determine if AstD 
can confer resistance to aspterric acid-sensitive strains, we developed 
a yeast-based assay. The genome copy of DHAD encoded by ILV3 was 
first deleted from the S. cerevisiae strain DHY AURA3, which resulted 
in an auxotroph that requires exogenous addition of Ile, Leu and Val 
to grow. We introduced either the gene encoding AteDHAD or astD 
episomally into the ILV3 knockout strain and found that both genes 
enabled the cells to grow in the absence of the three BCAAs (Extended 
Data Fig. 4). However, yeast cells expressing AteDHAD were approxi- 
mately 100 times more sensitive to aspterric acid (ICs9 of 2 1M) com- 
pared to yeast expressing AstD (ICs5» of 200 j1M) (Fig. 2c). Collectively, 
the biochemical and genetic assays validated that aspterric acid is, to 
our knowledge, the first natural-product inhibitor of fungal and plant 
DHAD, and that AstD serves as the self-resistance enzyme in the ast 
biosynthetic gene cluster. 

The (R)-a-hydroxyacid and (R)-configured 8-ether oxygen moie- 
ties in aspterric acid mimic the (2R, 3R)-dihydroxy groups present in 
natural substrates such as dihydroxyisovalerate. The }-ether oxygen in 
aspterric acid is in a position to coordinate to the 2Fe-2S cluster that is 
a required cofactor in both fungal and plant DHAD!*’”. To understand 
the potential mechanism of action of aspterric acid, we determined the 
crystal structure (2.11 A) of AthDHAD in complex with the 2Fe-2S 
cluster (holo-AthDHAD) (Fig. 2d, Extended Data Fig. 5, Extended Data 
Table 1). We identified a binding chamber at the homodimer interface, 
similar to that found in the holo bacterial t-arabinonate dehydratase”® 
(Fig. 2d). The interior of the chamber is positively charged (2Fe-2S 
and Mg**) whereas the entrance is lined with hydrophobic residues. 
The modelled binding mode of «,3-dihydroxyisovalerate and aspterric 
acid predicted by computational docking are shown in Fig. 2e. The 
pocket is sufficiently spacious to accommodate the bulkier aspterric 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


b c lS. cerevisiae DHY AILV3 
A. thaliana S. lycopersicum Z. mays 100; AteDHAD teers 
as ™@S. cerevisiae FE 
AA OuM 50uM AA OnM SOuM OuM 5SOuUM S& go] DHY AILv3 
: 5 AstD i i 
= 60 
re} 
= j 
5 40 i 
3 ai i 
6 - J i | i 
Qe rere —_ 
0.1 1 10 100 


Fig. 2 | Aspterric acid is a plant growth inhibitor. a, Two-week-old 

A. thaliana growing on Murashige and Skoog basal medium containing 
no aspterric acid (left) or 50 {1M aspterric acid (right). The picture shown 
is representative of three replicates. b, Same as in a, except for two-week- 
old dicotyledon S. lycopersicum and monocotyledon Z. mays. The picture 
shown is representative of two replicates. c, Verification of the self- 
resistance function of AstD. Growth-inhibition curve of aspterric acid on 
S. cerevisiae AILV3 strains expressing fungal housekeeping AteDHAD (blue) 
or AstD (red) in isoleucine, leucine and valine (ILV) dropout medium. 
Data are mean + s.d. from three biologically independent experiments. 
d, Crystal structure of dimeric holo-AthDHAD containing the cofactor 
2Fe-2S cluster and a Mg** ion with the docked aspterric acid in the 


acid, and provide stronger hydrophobic interactions than the native 
substrate with a 5.3 + 0.3 kcal mol! gain in binding energy (Fig. 2e). 
On the basis of the holo-AfthDHAD structure, we constructed a homol- 
ogy model of AstD to determine the potential mechanism of resist- 
ance (Extended Data Figs. 5, 6). Comparison of AthDHAD and the 
modelled AstD structures shows that although most of the residues 
in the catalytic chamber are conserved, the hydrophobic region at the 
entrance to the reactive chamber in AstD is more constricted as a result 
of two amino acid substitutions (V496L and I177L). Narrowing of the 
entrance could therefore sterically exclude the bulkier aspterric acid 
from binding in the active site, whereas the smaller, natural substrates 
are still able to enter the chamber. 

To explore the potential of aspterric acid as an herbicide, we per- 
formed spray treatment of A. thaliana with aspterric acid. We added 
aspterric acid into a commercial glufosinate formulation known as 
Finale at a final concentration of 250 M?”"8. We then sprayed aspter- 
ric acid solution onto glufosinate-resistant A. thaliana. Finale alone had 
no observable inhibitory effects on plant growth, but adding aspterric 
acid severely inhibited plant growth (Extended Data Fig. 7). In addition, 
A. thaliana plants treated with aspterric acid before flowering failed to 
form normal pollen, which has also been observed previously”>. We 
found that the pistil of treated plants could still be successfully polli- 
nated using healthy pollen from the untreated A. thaliana, indicating 
that aspterric acid preferentially affects pollen but not egg formation 
(Extended Data Figs. 8, 9). This effect was also observed with a lower 
concentration of aspterric acid (100 |1M). Thus, in addition to its her- 
bicidal properties, aspterric acid could potentially be used as a chemical 
hybridization agent for hybrid seed production”. 

We next investigated whether plants expressing astD are resistant 
to aspterric acid. This was motivated by the successful combination of 
glyphosate and genetically modified crops that are selectively resistant 
to glyphosate (Roundup Ready)*°. The A. terreus astD gene was codon 
optimized and the N terminus was fused to a chloroplast localization 
signal derived from AthDHAD. Wild-type or astD transgene-expressing 


Concentration of AA (uM) 


active site. One of the AthDHAD monomers is show in cyan, whereas the 
other one is shown in electrostatic surface representation. The docked 
aspterric acid is shown inset as a spaced-filled model. The hydrophobic 
portions of aspterric acid are surrounded by several hydrophobic residues 
(white spheres) from both monomers. e, Cross-section electrostatic map 
of modelled holo-AthDHAD in the binding site. Red surface map, the 
normalized negatively charged regions; blue surface map, the normalized 
positively charged regions; white surface map, the hydrophobic regions. 
The docked aspterric acid in the active site of AthDHAD is shown on the 
left, and the docked native substrate dihydroxyisovalerate is shown on 
the right. The docking studies suggest the hydrophobic entrance to the 
reaction chamber preferentially binds the bulkier, tricyclic aspterric acid. 
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Fig. 3 | Aspterric acid-resistance of Arabidopsis plants expressing astD 
transgenes. a, Phenotype of ten-day-old A. thaliana with (lower) and 
without (upper) the astD transgene growing on medium containing 100 1M 
aspterric acid. Control plants were transformed with a vector that carries the 
glufosinate ammonium selection marker but no astD transgene. The picture 
shown is representative of three replicates. b, Fresh weight of three-week- 
old Arabidopsis seedlings growing on medium with (red box) and without 
(blue box) 100 1M aspterric acid. Box plots show the median and whiskers 
extend to the first and third quartiles, with the individual data points from 
21 biologically independent experiments overlaid. c, Glufosinate-resistant 
Arabidopsis with (lower) and without (upper) astD transgene growing in soil 
were sprayed with glufosinate ammonium with (left) and without (right) 
250 M aspterric acid. d, Quantification of the height of Arabidopsis treated 
as in c. Data are mean + s.d. from 12 biologically independent experiments. 
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A. thaliana was then grown on medium that contained 100 1M aspter- 
ric acid. In the presence of aspterric acid, the growth of wild-type plants 
was strongly inhibited, and arrested at the cotyledon stage (Fig. 3a). By 
contrast, the growth of astD transgenic plants was relatively unaffected 
by aspterric acid, as indicated by the normally expanded rosette leaves, 
elongated roots and whole-plant fresh weight (Fig. 3a, b). The expres- 
sion of AstD was verified by western blot (Supplementary Fig. 8). A 
spray assay was also performed using T2 astD transgenic A. thaliana 
plants, which showed no observable growth defects under such treat- 
ment (Fig. 3c). By contrast, the control plants carrying the empty vector 
showed a strong growth inhibitory phenotype when treated with asp- 
terric acid (Fig. 3c). Quantitative measurements of plant height showed 
that AstD effectively confers aspterric acid resistance to A. thaliana 
(Fig. 3d). 

In summary, resistance-gene-directed discovery of natural products 
in the fungus A. terreus led to the discovery of a natural herbicide asp- 
terric acid and the determination of its mode of action. In addition, 
introducing astD as a transgene or editing the sequence of the plant 
DHAD endogenous gene could be used to create aspterric acid-resistant 
crops. We suggest that aspterric acid is a promising lead for develop- 
ment as a broad spectrum commercial herbicide. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Materials. Biological reagents, chemicals, media and enzymes were purchased 
from standard commercial sources unless stated. The plant, fungal, yeast and 
bacterial strains, plasmids and primers used in this study are summarized in 
Supplementary Tables 3, 4 and 5. DNA and RNA manipulations were carried out 
using Zymo ZR Fungal/Bacterial DNA Microprep kit and Invitrogen Ribopure 
kit respectively. DNA sequencing was performed at Laragen. The primers and 
codon-optimized gblocks were synthesized by IDT. 

Expression of ast genes in Aspergillus nidulans for cDNA isolation. Plasmids 
pYTU, pYTP, pYTR were digested with PaclI and Swal and used as vectors to 
insert genes*!. A gpda promoter was generated by PCR amplification using primers 
Gpda-pYTU-F and Gpda-R with pYTR serving as template. Genes to be expressed 
were amplified using PCR with the genomic DNA of A. terreus NIH2624 as a tem- 
plate. A 4.5-kb fragment obtained using primers AstD-pYTU-recomb-F and AstA- 
pYTU-recomb-R was cloned into pYTU together with a gpda promoter by yeast 
homologous recombination to obtain pAstD+AstA-pYTU. Yeast transformation 
was performed using Frozen-EZ Yeast Transformation II Kit (Zymo Research). A 
2.4-kb fragment obtained using primers AstB-pYTR-recomb-F and AstB-pYTR- 
recomb-R was cloned into pYTR by yeast homologous recombination to obtain 
pAstB-pYTR. Similarly, a 2.3-kb fragment obtained using primers AstC-pYTP- 
recomb-F and AstC-pYTP-recomb-R was cloned into pYTP by yeast homologous 
recombination to obtain pAstC-pYTP. 

All three plasmids (pAstD+AstA-pYTU, pAstB-pYTR and pAstC-pYTP) 

were transformed into A. nidulans according to standard protocols to result in the 
A. nidulans strain TY017!. TYO1 was cultured in liquid CD-ST medium (20 g 1"! 
starch, 20 g1~! peptone, 50 ml 1”! nitrate salts and 1 ml1~! trace elements) at 28°C 
for 3 days. Total RNA of TY01 was extracted with the Invitrogen Ribopure kit, and 
total cDNA of TY01 was obtained using the SuperScript III reverse transcriptase 
kit (Thermo Fisher Scientific). The cDNA fragment of astA was PCR amplified 
using primers AstA-xw55-recomb-F and AstA-xw55-recomb-R. The cDNA frag- 
ment of astB was PCR amplified using primers AstB-xw06-recomb-F and AstB- 
xw06-recomb-R. The cDNA fragment of astC was PCR amplified using primers 
AstC-xw02-recomb-F and AstC-xw02-recomb-R. The cDNA fragment of astD 
was PCR amplified using primers AstD-pXP318-F and AstD-pXP318-R. All the 
introns were confirmed to be correctly removed by sequencing. 
Construction of S. cerevisiae strains. Plasmid pXW55 (URA3 marker) digested 
with Ndel and Pmel was used to introduce the astA gene” into S. cerevisiae RCOL. 
A 1.3-kb fragment containing astA obtained from PCR using primers AstA-xw55- 
recomb-F and AstA-xw55-recomb-R was cloned into pXW55 using yeast homolo- 
gous recombination to produce pAstA-xw55. The plasmid pAstA-xw55 was then 
transformed into S. cerevisiae RCO1 to generate strain TY02”. 

Plasmid pXW06 (TRPI marker) digested with NdeI and Pmel was used 
to introduce the astB gene”? S. cerevisiae RCO1. A 1.6-kb fragment containing 
astB obtained from PCR using primers AstB-xw06-recomb-F and AstB-xw06- 
recomb-R were cloned into pXW06 using yeast homologous recombination to 
produce pAstB-xw06. The plasmid pAstB-xw06 was then transformed into TY02 
to generate strain TY03. 

Plasmid pXW06 (LEU2 marker) digested with Ndel and Pmel was used 
to introduce the astC gene” S. cerevisiae RCO1. A 1.6-kb fragment containing 
astC obtained from PCR using primers AstC-xw02-recomb-F and AstC-xw02- 
recomb-R were cloned into pXW02 using yeast homologous recombination to 
produce pAstC-xw02. The plasmid pAstC-xw02 was then transformed into TY03 
to generate strain TY04. 

The URA3 gene was inserted into the ILV3 locus of S. cerevisiae DHY AURA3 
strain to generate UBO1. A 879-bp homologous-recombination donor fragment 
with 35-40 bp homologous regions flanking the ILV3 ORF was amplified using 
primers ILV3p-URA3-F and ILV3t-URA3-R using yeast gDNA as a template. The 
PCR product was gel purified and transformed into S. cerevisiae DHY AURA3, 
and selected on uracil dropout medium to give UBO1. The resulting strain was sub- 
jected to verification using colony PCR with primers ILV3KO-ck-F and ILV3KO- 
ck-R and the amplified fragment was confirmed with sequencing. 

The URA3 gene inserted into the ILV3 locus of S. cerevisiae DHY AURA3 was 
deleted from UBO1 using homologous recombination to generate UB02. A 150- 
bp homologous-recombination donor fragment with 75-bp homologous regions 
flanking the ILV3 ORF was amplified using primers ILV3KO-F and ILV3KO-R, 
gel purified, transformed into UBO1, and counter-selected on 5-fluoroorotic acid 
(5-FoA)-containing medium to give UBO2. The resulting strain was subjected to 
verification using colony PCR with primers ILV3KO-ck-F and ILV3KO-ck-R and 
the amplified fragment was confirmed with sequencing. 

The empty plasmid pXP318 (URA3 marker) was transformed into UB02 to 
generate TY05*. 
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Plasmid pXP318 digested with Spel and Xhol was used as vector to introduce the 
gene encoding AteDHAD* into the plasmid UB02. The cDNA of A. terreus NIH 
2624 served as the template for PCR amplification. A 1.7-kb fragment obtained 
using primers AteDHAD-pXP318-F and AteDHAD-pXP318-R were cloned 
into pXP318 using yeast homologous recombination to produce AteOHAD- 
pXP318. Then, AteDHAD-pXP318 was transformed into UB02 to generate TY06. 
AteDHAD was driven by a constitutive TEF1 promoter. 

Plasmid pXP318 digested with Spel and XhoI was used as vector to introduce the 

astD gene* into the plasmid UB02. The cDNA isolated from TY01 served as the 
template for PCR amplification. A 1.8-kb fragment obtained using primers AstD- 
pXP318-F and AstD-pXP318-R were cloned into pXP318 using yeast homologous 
recombination to make AstD-pXP318. A Flag tag was also added to the N-terminal 
of AstD. AstD-pXP318 was then transformed into UB02 to generate TY07. AstD 
was driven by the constitutive TEF1 promoter. 
Fermentation and compound analyses and isolation. A seed culture of S. cerevi- 
siae strain was grown in 40 ml of synthetic dropout medium for 2 days at 28°C, 250 
r.p.m. Fermentation of the yeast was carried out using YPD (yeast extract 10 g1~, 
peptone 20 g1~') supplemented with 2% dextrose for 3 days at 28°C, 250 r.p.m. 

HPLC-MS analyses were performed using a Shimadzu 2020 EVLC-MS 
(Phenomenex Luna, 5,1, 2.0 x 100 mm, C-18 column) with positive and negative 
mode electrospray ionization. The elution method was a linear gradient of 5-95% 
(v/v) acetonitrile/water over 15 min, and then 95% (v/v) acetonitrile/water for 
3 min with a flow rate of 0.3 ml min” !. The HPLC buffers were supplemented with 
0.05% formic acid (v/v). HPLC purifications were performed using a Shimadzu 
Prominence HPLC (Phenomenex Kinetex, 5,1, 10.0 x 250 mm, C-18 column). 
The elution method was a linear gradient of 65-100% (v/v) acetonitrile/water in 
25 min, with a flow rate of 2.5 ml min !. Gas chromatography-mass spectrometry 
(GC-MS) analyses were performed using Agilent Technologies GC-MS 6890/5973 
equipped with a DB-FFAP column. An inlet temperature of 240°C and constant 
pressure of 4.2 psi were used. The oven temperature was initially set at 60°C, then 
ramped up at 10°C min“ for 20 min and finally held at 240°C for 5 min. 

To isolate compound 1, the fermentation broth of TY02 was centrifuged 

(5,180g, 10 min), and the cell pellet was collected and soaked in acetone. The 
organic phase was dried over sodium sulfate, concentrated to oil form and 
subjected to silica column purification with hexane. To isolate compound 2, the 
fermentation broth of TY03 was centrifuged (5,180g, 10 min), and the supernatant 
was extracted three times with ethyl acetate. The organic phase was dried over 
sodium sulfate, concentrated to oil form, and then and subjected to HPLC puri- 
fication. To isolate aspterric acid, the fermentation broth of TY04 was centrifuged 
(5,180g, 10 min), and supernatant was extracted three times with ethyl acetate. 
The organic phase was dried over sodium sulfate, concentrated to oil form, and 
subjected to HPLC purification. 
Structure determination of compounds. Compound 1, a colourless oil that read- 
ily dissolved in hexane and chloroform, had a molecular formula C5H2,, as 
deduced from electron ionization—mass spectrometry (EI-MS) [M]* m/z 204, and 
showed lalp = —30° (n-hexane; c = 0.1). GC-MS 70 eV, m/z (relative intensity): 
204 [M]* (42), 189 (5), 161 (35), 136 (100), 133 (10), 121 (70), 119 (25), 107 (20), 
105 (27), 93 (21), 91 (26), 79 (13), 77 (15), 69 (20), 55 (12), 43 (12), 41 (13), 38 (21); 
‘1H NMR (500 MHz, CDCI): 6 (p.p.m.) 5.37 (1H, m), 2.20-2.10 (5H, m), 2.10-2.00 
(2H, m), 1.95 (1H, d, 15.3), 1.75 (3H, s), 1.71 (3H, q, 1.7), 1.61 (3H, br s), 1.44 (1H, 
dd, 11.4, 7.2), 1.36 (1H, m), 1.31 (1H, dd, 11.3, 2.6), 0.73 (3H, s); 3C NMR (125 
MHz, CDCls): 6 138.4, 138.3, 122.4, 122.2, 57.4, 42.6, 41.4, 40.3, 34.5, 29.6, 27.3, 
25.0, 23.3, 20.6, 19.2. Both of the NMR and mass spectrometry spectra are identi- 
cal to a known compound (+)-daucane, however, the optical rotation is opposite 
which led to the assignment of 1 to be (—)-daucane*. 

Compound 2, a colourless oil that readily dissolved in ethyl acetate and chloro- 
form, had a molecular formula C};H2)0s3, as deduced from liquid chromatography- 
mass spectrometry (LC-MS) [M+H]* m/z 251, [M — H]~ m/z 249.'H NMR 
(500 MHz, CDCl,): 6 8.09 (1H, brs), 3.25 (1H, t, 7.4), 2.71 (1H, dd, 14.6, 6.5), 2.48 
(1H, dd, 14.8, 6.3), 2.36 (1H, dd, 14.0, 6.6), 2.26 (1H, m), 2.15 (1H, dd, 16.3, 8.9), 
2.08 (1H, d, 12.0), 1.84 (1H, q, 13.1), 1.73 (3H, d, 2.3), 1.59 (3H, d, 2.2), 1.48-1.35 
(3H, m), 1.31 (1H, td, 11.5, 9.0), 0.86 (3H, s). *C NMR (125 MHz, CDCl): 6 176.0, 
135.8, 123.2, 60.1, 59.8, 59.4, 44.1, 40.5, 38.8, 30.6, 29.3, 24.9, 23.8, 20.6, 17.8. 

Compound 3, a colourless oil that readily dissolved in acetone and chloroform, 
had a molecular formula Cj5H 2204, as deduced from LC-MS [M+H]* m/z 267, 
[M-H] m/z 265. 'H NMR (500 MHz, CDCI,): 6 4.29 (1H, d, 8.5), 3.92 (1H, d, 8.3), 
3.48 (1H, d, 8.3), 2.42 (1H, dd, 14.9, 7.3), 2.37-2.28 (2H, m), 2.25 (1H, dd, 13.0, 4.4), 
2.20-2.17 (1H, m), 2.12 (1H, d, 13.4), 2.01 (1H, m), 1.80-1.65 (2H, m), 1.71 (3H, s), 
1.64-1.54 (1H,m), 1.60 (3H, s), 1.50 (1H, m); ®C NMR (125 MHz, CDCL): 6 178.2, 
134.5, 125.2, 82.9, 76.3, 75.6, 55.4, 53.0, 36.6, 36.2, 33.8, 32.2, 23.6, 23.4, 20.9. 
Compound 3 is identical to aspterric acid as reported”*». 

Protein expression, purification and biochemical assay. To express and purify 
AthDHAD, primers AthDHAD-pET-F and AthDHAD-pET-R were used to 
amplify a 1.7-kb DNA fragment containing AthDHAD (AT3G23940). The PCR 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


product was cloned into pET28a using Nhel and Nofl restriction sites. The resulting 
plasmid AthDHAD-pET was transformed into E.coli BL21 (DE3) to give TY08. 
To express and purify AteDHAD (XP_001208445.1), primers AteDHAD-pET-F 
and AteDHAD-pET-R were used to amplify a 1.6-kb DNA fragment contain- 
ing AteDHAD. The PCR product was cloned into pET28a using Ndel and Notl 
restriction sites. The resulting plasmid AteDHAD-pET was transformed into 
E. coli BL21 (DE3) to obtain TY09. To express and purify AstD (XP_001213593.1), 
primers AstD-pET-F and AstD-pET-R were used to amplify a 1.6-kb DNA frag- 
ment containing astD. The PCR product was cloned into pET28a using Ndel and 
NotI restriction sites. The resulting plasmid AstD-pET was transformed into 
E. coli BL21 (DE3) to obtain TY10. All DHADs with a fused 6 x His-tag with a 
molecular mass of ~62 kDa were expressed at 16°C 220 rpm for 20 h after 100 1M 
isopropyl 8-p-1-thiogalactopyranoside IPTG induction (IPTG was added when 
OD600 nm = 0.8). Cells from a 1-1 culture were then collected by centrifugation at 
5,180g at 4°C. Cell pellet was resuspended in 15 ml Buffer A10 (20 mM Tris-HCl 
pH 7.5, 50 mM NaCl, 8% glycerol, 10 mM imidazole). The cells were lysed by son- 
ication, and the insoluble material was sedimented by centrifugation at 35,267¢ at 
4°C. The protein supernatant was then incubated with 3 ml Ni-NTA for 4h with 
slow, constant rotation at 4°C. Subsequently the Ni-NTA resin was washed with 
ten column volumes of Buffer A50 (Buffer A with 50 mM imidazole). For elution 
of the target protein, the Ni-NTA resin was incubated for 10 min with 6 ml Buffer 
A300 (Buffer A with 300 mM imidazole). The supernatant from the elution step 
was then analysed by SDS-PAGE together with the supernatants from the other 
purification steps. The elution fraction containing the recombinant protein was 
buffer exchanged into storage buffer (50 mM Tris-HCl pH 7.2, 50 mM NaCl, 
10 mM MgCh, 10% glycerol, 5 mM DTT, 5 mM GSH). 

In vitro activity assays were carried out in 50 jl reaction mixture containing 
storage buffer, 10 mM (+)-sodium «,3-dihydroxyisovalerate hydrate (4) and 
0.5 uM of purified DHAD enzyme. The reaction was initiated by adding the 
enzyme. After 0.5 h incubation at 30°C, the reactions were stopped by adding an 
equal volume of ethanol. Approximately 0.1 volumes of 100 mM phenylhydrazine 
(PHH) was added to derivatize the product 3-methyl-2-oxo-butanoic acid (5) into 
6 at room temperature for 30 min. 20 jl of the reaction mixture was used for the 
LC-MS analysis. The area of the HPLC peak with UV absorption at 350 nm were 
used to quantify the amount of 6. (Extended Data Fig. 2). 

The inhibition percentage of aspterric acid on DHADs was determined using 
in vitro biochemical assays and calculated with following equation: 


initial reaction rate with aspterric acid 


inhibition percentage = 1 
initial reaction rate without aspterric acid 


Growth inhibition assay of S. cerevisiae on plates or in the tubes. S. cerevisiae was 
grown in isoleucine, leucine and valine (ILV) dropout medium (20 g 1“! glucose, 
0.67 g1"' Difco Yeast Nitrogen Base without amino acids, 18 mg 1! adenine, 
arginine 76 mg 1", asparagine 76 mg1~!, aspartic acid 76 mg 1~!, glutamic acid 
76 mg 1", histidine 76 mg 1~', lysine 76 mg 1~!, methionine 76 mg 1’, phenyla- 
lanine 76 mg 1~1, serine 76 mg 1", threonine 76 mg 1~!, tryptophan 76 mg 1~!, 
tyrosine 76 mg 1’) to test growth inhibition of aspterric acid on S. cerevisiae, cells 
were incubated at 28°C until OD¢00 nm of the control strain without aspterric acid 
treatment reached about 0.8. The ratio of yeast OD¢00 nm in medium with aspterric 
acid treatment to yeast ODgo9 in medium without aspterric acid was calculated as 
the percentage of growth inhibition. The inhibition curve was plotted as the percent- 
age of inhibition versus aspterric acid concentrations. To further prove aspterric acid 
affects BCAA biosynthesis, isoleucine, leucine and valine were also complemented 
to the medium with or without treatment with aspterric acid. The growth curves 
of TY05, TY06 and TY07 were also plotted in Extended Data Fig. 4. The OD¢00 nm 
was recorded for every 20 min over a total of 50 h. The growth inhibition percent- 
age of aspterric acid on S. cerevisiae strain is calculated by dividing the cell density 
(OD600 nm) of the aspterric acid-treated strain to the corresponding untreated strains 
when OD¢00 nm reaches approximately 0.8 using the following equation: 


ODgo0 nm Of AA treated strain 
0.8 


growth inhibition percentage = 1 


in which 0.8 is the OD¢00 nm of the untreated strain. 

Growth inhibition assay of plants on plates or in the tubes. MS (2.16 g1~! 
Murashige and Skoog basal medium, 8 g1~! sucrose, 8 g 1“! agar) medium was 
used to test the growth inhibition of aspterric acid on A. thaliana, S. lycopersicum 
and Z. mays. A. thaliana, S. lycopersicum and Z. mays were grown under long 
day condition (16/8 h light/dark) using cool-white fluorescence bulbs as the light 
resource at 23°C. Aspterric acid was dissolved in ethanol and added to the medium 
before inoculating strains or growing plants. The medium of the control treatment 
contained the same amount of ethanol, but without aspterric acid. 

Plant growth inhibition assay by spraying. Aspterric acid was first dissolved in 
ethanol and then added to solvent (0.06 g1~! Finale (Bayer) with 20 g1~! EtOH). 


The control plants were treated with solvent containing ethanol only. A. thaliana 
that are resistant to glufosinate (containing the bar gene) were grown under long 
day condition (16/8 h light/dark) using cool-white fluorescence bulbs as the light 
resource at 23°C. Spraying treatments began upon seed germination and were 
repeated once every two days with approximately 0.4 ml aspterric acid solution 
each time per pot. 

Structure determination of holo-AthDHAD. The gene encoding AthDHAD 
(residues 35-608) was cloned into pET21a derivative vector pSJ2 with an eight 
histidine (8 x His) tag and a TEV protease cleavage site at the N-terminus. The 
forward primer DHAD-F and the reverse primer DHAD-R were used for clon- 
ing. The double mutant K559A/K560A for efficient crystallization was designed 
using the surface entropy reduction prediction (SERp) server**. Mutations were 
generated by PCR using the forward primer K559AK560A-F and reverse primer 
K559AK560A-R. All constructed plasmids were verified by DNA sequencing. 

AthDHAD purified under aerobic conditions was found to contain no iron- 
sulfur cluster (apo form). Hence we performed [2Fe-2S] cluster reconstitution 
under the atmosphere of nitrogen in an anaerobic box. The protein was incubated 
with FeCl, at the ratio of 1:10 for 1 h on ice and then 10 equivalents of Na2S per 
protein was added drop-wise every 30 min for 3 h. The reaction mixture was then 
incubated overnight. Excess FeCl; and Na2S were removed using a SephadexTM 
G-25 Fine column (GE Healthcare)”°. 

The reconstituted holo-AthDHAD was crystallized in an anaerobic box. The 
proteins (at 10 mg ml!) were mixed in a 1:1 ratio with the reservoir solution ina 
2-1 volume and equilibrated against 50 1l reservoir solution, using the sitting-drop 
vapour diffusion method at 16°C. Crystals for diffraction were observed in 0.1 M 
sodium acetate pH 5.0, 1.5 M ammonium sulfate after 5 days. 

All crystals were flash-cooled in liquid nitrogen after cryo-protection with a 
solution containing 25% glycerol, 1.5 M ammonium sulfate, 0.1 M sodium acetate 
pH 5.0. The data were collected at 100 K at the Beam Line 19U]1 in the Shanghai 
Synchrotron Radiation Facility (SSRF). Diffraction data of holo-AthDHAD were 
collected at the wavelength of 0.97774 A. The best crystals diffracted to a resolution 
of 2.11 A. The Ramachandran plot favoured (%), allowed (%) and outlier (%) are 
98.05, 1.60, and 0.36, respectively. All datasets were indexed, integrated, and scaled 
using the HKL3000 package**. The crystals belonged to space group P4,2)2. The 
statistics of the data collection are summarized in Extended Data Table 1. 

The holo-AthDHAD structure was solved using the molecular replacement 
method Phaser embedded in the CCP4i suite and the L-arabinonate dehydratase 
crystal structure (RCSB Protein Data Bank (PDB) ID: 5J83) as the search model. 
All the side chains were removed during the molecular replacement process***”. 
The resulting model was refined against the diffraction data using the REFMAC5 
program of CCP4i**. On the basis of the improved electron density, the side chains 
of the holo-AthDHAD protein, iron-sulfur cluster, water molecule, acetate ion, sul- 
fate ions, and magnesium ion were manually built using the program WinCoot®. 
The Rwork and Rfree values of the structure are 17.27% and 21.52%, respectively. 
The detailed refinement statistics are summarized in Extended Data Table 1. The 
geometry of the model was validated by WinCoot. Structural factor and coordinate 
of holo-AthDHAD have been deposited in the Protein Data Bank (PDB ID: 5ZE4). 
Homology modelling of AstD and docking of substrate or aspterric acid into 
the active site of holo-AthDHAD. The structure of holo-AthDHAD was prepared 
using Schrodinger suite software under an OPLS3 force field*”. Hydrogen atoms 
were added to reconstituted crystal structures according to the physiological 
pH (7.0) with the PROPKA tool in Protein Preparation tool in Maestro to optimize 
the hydrogen bond network”**!. Constrained energy minimizations were con- 
ducted on the full-atomic models, with heavy atom coverage to 0.5 A. The homology 
model was performed in Modeller 9.18%, using the crystal structure of holo- 
AthDHAD solved in this work as a template. Sequence alignment in Modeller 
indicated that AstD and AthDHAD shared 56.8% sequence identity and 75.0% 
sequence similarity (Extended Data Fig. 6). All the highly conserved residues and 
motifs were properly aligned. A total of 2,000 models were generated for each target 
in Modeller with the fully annealed protocol. The optimal models were chosen for 
docking studies according to DOPE (Discrete Optimized Protein Energy) score. 

All ligand structures were built in Schrodinger Maestro software”*. The LigPrep 
module in Schrodinger software was introduced for geometric optimization by 
using an OPLS3 force field*’. The ionization states of ligands were calculated with 
Epik tool using Hammett and Taft methods in conjunction with ionization and 
tautomerization tools**. The docking of a ligand to the receptor was performed 
using Glide“. We included cofactors observed in the crystal structure during the 
docking. As both water and SO,’~ occupied the catalytic site, they were excluded 
before docking. Cubic boxes centred on the ligand mass centre with a radius of 
8 A for all ligands defined the docking binding regions. Flexible ligand docking 
was executed for all structures. Ten poses per ligand out of 20,000 were included 
in the post-docking energy minimization. The best scored pose for the ligand 
was chosen as the initial structure for further study. The molecular mechanics 
energies combined with the generalized Born and surface area continuum solvation 
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(MM/GBSA) method was introduced to evaluate the ligand binding affinity on 
the basis of the best-scored docking pose in Schrodinger software. Figures were 
prepared in PyMOL and Inkscape*™*. Both the native substrate ,(3-dihydroxy- 
isovalerate and aspterric acid were docked into the catalytic site of AthDHAD. The 
cross-section electrostatic surface map shows this unique catalytic pocket has a 
positively charged interior and a hydrophobic entrance, which binds to negatively 
charged ‘head’ and hydrophobic ‘tail’ of the substrate or aspterric acid, respectively. 
Thus the negatively charged ‘head’ can lead both of the substrate and aspterric acid 
into the catalytic chamber. The bulky hydrophobic tricyclic moiety of aspterric 
acid, however, provides stronger hydrophobic interactions to the entrance and 
blocks the entrance of the active site owing to the hydrophobic residues at the 
entrance (Fig. 2d). By contrast, the smaller ‘tail’ of the native substrate provides 
fewer interactions to the entrance because the smaller size limits efficient hydro- 
phobic contact to nearby residues. This implies that once aspterric acid binds to 
AthDHAD, it can prevent the substrate approaching the active site. We also intro- 
duced the MM/GBSA method, a widely used approach for relative binding energy 
calculation, to evaluate the relative binding affinity for both ligands*”. The MM/GBSA 
calculations were done in Prime*® (Schrédinger 2015 suite). The MM/GBSA 
energy was calculated using following equation, AGping = Ecomplex — Eprotein — 
Ejigana- E denotes energy and includes terms such as protein—ligand van der Waals 
contacts, electrostatic interactions, ligand desolvation, and internal strain (ligand 
and protein) energies, using a VSGB2.0 implicit solvent model with the OPLS2005 
force field. The solvent entropy is also included in the VSGB2.0 energy model, as it 
is for other generalized Born and Poison—Boltzmann continuum solvent models. 
MM/GBSA calculation shows that the relative binding energy for aspterric 
acid and «,3-dihydroxyisovalerate is —18.6 + 0.3 kcal mol~! and —13.3 + 
0.2 kcal mol’, respectively, which shows that the binding constant of aspterric 
acid to the active site is about 6,000 times greater than «,8-dihydroxyisovalerate. 
This further confirms that aspterric acid is a competitive inhibitor of AthDHAD. 
Cytotoxicity assay of aspterric acid. Cell proliferation experiments were 
performed in a 96-well format (five replicates per sample) using the human 
melanoma cell lines A375 and SK-MEL-1. Aspterric acid treatments were initiated 
24h after seeding for 72 h, and cell survival was quantified using the CellTiter- 
GLO assay (Promega). 
Cross experiment of A. thaliana. To make male sterile A. thaliana, aspterric acid 
was added to a chemical hybridization agent (CHA) formulation (250 tM aspterric 
acid, 2% ethanol, 0.1% Tween-80, 1% corn oil in water), which has less inhibition 
effect on the growth of A. thaliana. Flowers of the aspterric acid-treated Col-0 were 
selected as the female parent. The non-treated A. thaliana containing a glufosinate 
resistance gene were used as the male parent to donate pollen. Two-week-old F1 
progeny resulting from the cross were treated by Finale (11.3% glufosinate-ammonium) 
at a 1:2,000 dilution. The results are summarized in Extended Data Fig. 9. 
Construction of the transgenic plants. The coding sequence of AstD was 
codon optimized for A. thaliana. A chloroplast-localization signal (CLS) of 
35 amino acid residues derived from the N-terminal of A. thaliana DHAD 
(MQATIFSPRATLFPCKPLLPSHNVNSRRPSIISCS) was fused to the N terminus 
of the codon-optimized AstD. A 3 x Flag-tag was inserted between the CLS and 
the codon-optimized AstD (Supplementary Table 6). The gene block containing 
CLS, the Flag-tag and astD was synthesized and then cloned into pEG202 vec- 
tor using Gateway LR Clonase II Enzyme Mix (Thermo Fisher Scientific). The 
original CaMV 35S promoter of pEG202 was substituted by the ubiquitin-10 pro- 
moter to drive the expression of AstD. The construct was electro-transformed into 
Agrobacterium tumefaciens strain Agl0 and then transformed into A. thaliana using 
the standard floral dip method”. The A. thaliana Col-0 ecotype was transformed. 
Positive transgenic plants were selected using the glufosinate resistance marker, 
and were tested for survival in the presence of aspterric acid. 
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Protein expression verification with western blot. Approximately 0.5 g of leaf tis- 
sue of transgenic A. thaliana was ground in liquid nitrogen. Proteins were homog- 
enized in 2 x SDS buffer and then centrifuged at 21,000g for 5 min to remove 
undissolved debris. The supernatant containing resolved proteins were loaded 
onto a 4-12% Bis-Tris gel, and separated using MOPS running buffer. Transfer was 
conducted using an iBlot2 dry transfer device and a PVDF membrane. The total 
proteins were stained with Ponceau to demonstrate equal loading. Western blotting 
was performed using Sigma monoclonal anti-Flag M2-Peroxidase antibody, with 
detection using the Amersham ECL Prime detection reagent. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The data that support the findings of this study are available 
within the paper and its Supplementary Information, or are available from the 
corresponding authors upon reasonable request. The structural factor and coor- 
dinate of holo-AthDHAD have been deposited in the Protein Data Bank under 
the ID 5ZE4. 
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Extended Data Fig. 1 | The rationale of resistance-gene-directed The biosynthetic core genes are shown in blue and the self-resistance 
discovery of a natural herbicide with a new mode of action. enzymes (SREs) are shown in red. The blockbuster cholesterol-lowering 
a, Phylogenetic tree of DHAD among bacteria, fungi and plants. The lovastatin drug targets HMG-CoA reductase (HMGR) in eukaryotes. In 
evolutionary history was inferred by using the neighbour-joining the fungus A. terreus that produces lovastatin, a second copy of HMGR 
method (MEGA7). Scale-bar units represent the number of amino encoded by ORFS is present in the gene cluster (top). The BGC of the 
acid substitutions per site. b, Representatives of small molecules that immunosuppressant mycophenolic acid from Penicillium sp. contains a 
inhibit DHAD in vitro, but fail to inhibit plant growth. c, Examples second copy of inosine monophosphate dehydrogenase (IMPDH), which 
of co-localization of biosynthetic gene clusters (BGCs) and targets. represents the SRE to this cluster (bottom). 
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Extended Data Fig. 2 | Biochemical assays of DHAD functions. 

a, Assaying DHAD activities in the conversion of the dihydroxyacid 4 into 
the a-ketoacid 5. Formation of 5 can be detected with HPLC by chemical 
derivatization using phenylhydrazine (PHH) to yield 6. b, LC-MS traces 
of the biochemical assays of AthDHAD (plant DHAD, pDHAD). EIC of 
positive ion mass of [M + H]* = 207 is shown in red. Panels i-iv in b: i, 


the derivatization reaction was validated by using the authentic 5; ii, the 
bioactivity of AthDHAD in converting 4 into 5 was validated; iii, addition 
of DMSO to AthDHAD enzymatic reaction mixture has no effect; and 

iv, addition of 10 .M aspterric acid to the reaction mixture abolished 
AthDHAD activity. The experiments were repeated independently three 
times with similar results. 
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Extended Data Fig. 3 | Inhibition assay of different DHADs using 
aspterric acid. a~c, Three DHAD enzymes were assayed, including 
AthDHAD (plant DHAD, pDHAD), AteDHAD (fungal housekeeping 
DHAD from A. terreus, f{DHAD) and AstD (DHAD homologue within 
ast cluster). ICs9 and K; values of aspterric acid were measured on the 
basis of inhibition percentage at different aspterric acid concentrations. 
Data are mean + s.d. from three biologically independent experiments. 
a, Plot of the inhibition percentage of 0.5 1M AteDHAD as a function 


f 0 1 T 


0.5 1 
[AA] (uM) 


of aspterric acid concentration. b, Plot of the inhibition percentage of 

0.5 tM AthDHAD as a function of aspterric acid concentration. c, Plot 

of the inhibition percentage of 0.5 1M AstD as a function of aspterric 

acid concentration. d, Analysis of inhibitory kinetics of aspterric acid on 
AthDHAD using the Lineweaver-Burk method at different concentrations 
of aspterric acid (left). Linear fitting of the apparent Michaelis constant 
(Kapp) as a function of aspterric acid concentration yields the K; of 
aspterric acid on AthDHAD (right). 
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Extended Data Fig. 4 | Growth curve of S. cerevisiae AILV3 expressing 
AstD and AteDHAD. a-d, The genome copy of DHAD encoded by ILV3 
was first deleted from S. cerevisiae strain DHY AURA3 to give UBO2. 
UB02 was then either chemically complemented by growth on ILV 
(leucine, isoleucine and valine)-containing medium or genetically by 
expressing of AteDHAD (fungal housekeeping DHAD from A. terreus, 
fDHAD) or AstD episomally (TY06 or TY07, respectively). The empty 
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vector pXP318 was also transformed into UB02 to generate a control 
strain TY05. Cell growth (optical density) under different conditions was 
plotted as a function of time. Data are mean + s.d. from three biologically 
independent experiments. a, Growth curve in ILV dropout medium with 
no aspterric acid. b, Growth curve in ILV dropout medium with 125 1M 
aspterric acid. c, Growth curve in ILV supplemented medium. d, Growth 
curve in ILV supplemented medium with 250 1M aspterric acid. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | X-ray structure of holo-AthDHAD and 
homology model of AstD. a, Superimpositions of the monomer of holo- 
AthDHAD (PDB: 5ZE4, 2.11 A) and RIArDHT (PDB: 5J84). The holo 
structure containing the 2Fe-2S cofactor and Mg”" ion in the active 

site. The structure of holo-AthDHAD is in white; the crystal structure 

of RIArDHT is in cyan. b, Superimpositions of holo-AthDHAD and 
homology-modelled AstD. The structure of AstD was constructed by 
homology modelling on the basis of the structure of holo-AthDHAD. The 
structure of holo-AthDHAD is in white; the crystal structure of AstD is 
in green. c, The electron density map of cofactors in the holo structure 
of AthDHAD. White mesh indicates the 2F, — F. map at the 1.20 level; 
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green mesh indicates the F, — F, positive map at the 3.2c level; cyan sticks 
represent the acetic acid molecule. d, Comparison of the active sites in 
the crystal structure of AthDHAD and the modelled structure of AstD. 
The cartoon represents superimposed binding sites of AthDHAD (white) 
and AstD (green). The shift of a loop in AstD, where L518 (corresponding 
to V496 in AthDHAD) is located, coupled with a larger L198 residue 
(corresponding to I177 in AthDHAD) leads to a smaller hydrophobic 
pocket in AstD than in AthDHAD. e, The surface of binding sites of AstD 
(left) and AthDHAD (right). The smaller hydrophobic channel in the 
modelled AstD cannot accommodate the aspterric acid molecule (yellow 
ball and stick model). 
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Identities normalised by aligned length. 
Colored by: identity + property 
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Extended Data Fig. 6 | Sequence alignment between AthDHAD and AstD. The sequence identity between AthDHAD and AstD is 56.8%, whereas the 
similarity between them is 75.0%. Residues were coloured according to their property and similarity. 
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Solvent 250 uM AA in solvent 


Extended Data Fig. 7 | Spray assay of aspterric acid on A. thaliana. 
Glufosinate-resistant A. thaliana was treated with (right) or without (left) 
aspterric acid in the solvent, which is a commercial glufosinate-based 
herbicide marketed as Finale. To improve the wetting and penetration, 
aspterric acid was first dissolved in ethanol and then added to the 

solvent (0.06 g1~! Finale (Bayer) with 20 g1~’ ethanol) to make 250 1M 
aspterric acid spraying solution. The control plants were treated with 
solvent containing ethanol only. Spraying treatments began upon seed 
germination, and were repeated once every two days with approximately 
0.4 ml aspterric acid solution per time per pot for four weeks. The picture 
shown is taken after one month of treatment. The application rate of 
aspterric acid is approximately 1.6 lb per acre, which is comparable to 
the commonly used herbicide glyphosate (0.75-1.5 lb per acre). The 


experiments were repeated independently three times with similar results. 
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Extended Data Fig. 8 | Specific inhibition of anther development in pollination. b, e, The aspterric acid treated flower is missing one stamen. 
A. thaliana. a-f, Comparison of flower organs between the aspterric c, f, The aspterric acid treated anther is depleted of healthy and mature 
acid-treated (a-c) and non-treated (d-f) Arabidopsis. a, d, The aspterric pollen. The experiments were performed twice with similar results. 


acid-treated flower shows abnormal pistil elongation owing to the lack of 
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Extended Data Fig. 9 | Schematic of results from the cross experiment. resistance from the pollen donor. b, As in a, except that the pollen donor 


a, Wild-type A. thaliana treated with 250 1M aspterric acid was pollinated —_ was also treated with 250 1M aspterric acid. No offspring was obtained 
with pollen from the un-treated plant that carries the glufosinate- from this cross. Similar results were obtained after treatment with 100 1M 
resistance gene. Offspring was obtained, and inherited the glufosinate aspterric acid. 
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Extended Data Table 1 | Data collection and refinement statistics (molecular replacement) 


Data collection 
Space group 
Cell dimensions 

a, b,c (A) 

a, By (°) 
Resolution (A) 
Reis or Reape 
I/ol 
Completeness (%) 
Redundancy 


Refinement 

Resolution (A) 

No. reflections 

Ryork / Réree 

No. atoms 
Protein 
Ligand/ion 
Water 

B-factors 
Protein 
Ligand/ion 
Water 

R.m.s. deviations 
Bond lengths (A) 
Bond angles (°) 


*Values in parentheses are for the highest-resolution shell. 
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30.00-2.11 
33076 (1709) 
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4224 
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Prespliceosome structure provides insights into 
spliceosome assembly and regulation 


Clemens Plaschka!*#*, Pei-Chun Lin!**, Clément Charenton! & Kiyoshi Nagai!* 


The spliceosome catalyses the excision of introns from pre-mRNA 
in two steps, branching and exon ligation, and is assembled from 
five small nuclear ribonucleoprotein particles (snRNPs; U1, U2, 
U4, U5, U6) and numerous non-snRNP factors!. For branching, 
the intron 5’ splice site and the branch point sequence are selected 
and brought by the U1 and U2 snRNPs into the prespliceosome’, 
which is a focal point for regulation by alternative splicing factors”. 
The U4/U6.U5 tri-snRNP subsequently joins the prespliceosome to 
form the complete pre-catalytic spliceosome. Recent studies have 
revealed the structural basis of the branching and exon-ligation 
reactions*, however, the structural basis of the early events in 
spliceosome assembly remains poorly understood‘. Here we report 
the cryo-electron microscopy structure of the yeast Saccharomyces 
cerevisiae prespliceosome at near-atomic resolution. The structure 
reveals an induced stabilization of the 5’ splice site in the U1 snRNP, 
and provides structural insights into the functions of the human 
alternative splicing factors LUC7-like (yeast Luc7) and TIA-1 (yeast 
Nam8), both of which have been linked to human disease*®. In 
the prespliceosome, the U1 snRNP associates with the U2 snRNP 
through a stable contact with the U2 3’ domain and a transient 
yeast-specific contact with the U2 SF3b-containing 5’ region, 
leaving its tri-sn RNP-binding interface fully exposed. The results 
suggest mechanisms for 5’ splice site transfer to the U6 ACAGAGA 
region within the assembled spliceosome and for its subsequent 
conversion to the activation-competent B-complex spliceosome”® 
Taken together, the data provide a working model to investigate the 
early steps of spliceosome assembly. 

To gain structural insights into early spliceosome assembly, we pre- 
pared the yeast prespliceosome A-complex on the UBC4 pre-mRNA 
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that carries a mutation in the pre-mRNA branch point sequence, which 
was previously used to stall the A-complex? (UACUAAC to UACAAAC, 
in which A is the branch point adenosine and A is the mutated nucle- 
otide) (Extended Data Fig. 1a, b). The purified A-complex contained 
stoichiometric amounts of the U1 and U2 snRNP proteins (Extended 
Data Fig. 1b), and was used to determine cryo-electron microscopy 
(cryo-EM) densities of the A-complex at 4.0A (U1 snRNP, map A2) 
and 4.9-10.4 A (U2 snRNP, maps A1 and A3) resolution, respec- 
tively (Extended Data Figs. 1c-e, 2). From these densities we built a 
near-complete atomic model of the A-complex (Fig. 1, Supplementary 
Videos 1, 2, Supplementary Data, Extended Data Fig. 1f), comprising 
34 proteins, U1 and U2 snRNAs, and 34 nucleotides of pre-mRNA. The 
final model lacks the mobile cap-binding complex, Prp5 or the U1 sub- 
unit Prp40 (Extended Data Fig. 1b, d, e; Extended Data Table 1). The 
elongated U1 and U2 snRNPs bind the pre-mRNA 5’ splice site (5’SS) 
and branch-point sequences, respectively, and associate in a parallel 
manner to form the A-complex (Fig. 2a). The U1 snRNP structure 
contains all the essential regions of the U1 snRNA and 16 proteins 
(Fig. 1). The U1 snRNP ‘core’ is highly similar to its human counter- 
part!° (Extended Data Figs. 3, 4), comprising the seven-membered Sm 
ring and orthologues of the human U1 snRNP proteins (Snp1, human 
U1-70k; Mud1, human U1A; Yhcl, human U1C), and is bound to 
the peripheral yeast U1 proteins Luc7, Nam8, Prp39, Prp42, Snu56 
and Snu711! (Extended Data Figs. 3, 4). The U2 snRNP has a bipartite 
structure as observed in B-complex®, comprising the SF3b subcomplex 
(‘5' region’) and the U2 3’ domain and SF3a subcomplex (‘3’ region’) 
that are organized around the 5/ and 3’ regions of the U2 snRNA, 
respectively (Figs. 1, 2a, Extended Data Fig. 5). At the current resolu- 
tion, the conformation of the U2 5’ region appears unchanged from the 


Fig. 1 | Prespliceosome A-complex structure. 
Two orthogonal views of the yeast A-complex 
structure. Subunits are coloured according 

to snRNP identity (U1, shades of purple, 

U2, shades of green), and the pre-mRNA 

intron (black) and its 5’ exon (orange) are 
highlighted. The orthologous human protein 
name is shown after the solidus. The location of 
the cap-binding complex (CBC) is indicated by 
a brown oval (see Extended Data Fig. le). 
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Fig. 2 | 5’SS recognition and implications for alternative splicing. a, The 
A-complex U1-U2 snRNP interfaces (A and B) and the RNA network are 
shown as cartoons, and are superimposed on the transparent surfaces of 
the prespliceosome proteins. The U2 subunit Hsh155 surface (grey oval), 
which interacts with the tri-snRNP in the B-complex, is freely accessible 
in the A-complex. The U1 snRNP proteins Nam8 (orange, human TIA-1), 
Luc7 (purple, human LUC7L), Prp39 (magenta, human PRPF39) and 
Yhcl (dark magenta, human U1C) and the U2 snRNP proteins Leal (light 
green, human U2-A’), Rsel (dark green, human SF3B3), and Prp9 (teal, 
human SFA3) are shown as ribbons. BP, branch point. b, The pre-mRNA 
5/SS is recognized by the U1 snRNA 5’ end, and is stabilized by Luc7 and 
Yhc1. Notably, the Yhcl ZnF and Luc7 ZnF2 domains are arranged with 
pseudo-C2 symmetry around the U1-5’SS helix. c, Nam8 binds the U1 
snRNP through its linker (yellow), RNA recognition motif 3 (RRM3, light 
orange) and C-terminal regions (orange), whereas its RRM1 and RRM2 
domains are mobile and project towards the intron to bind uridine-rich 
sequences downstream of the pre-mRNA 5’SS (dashed black line), as with 
its human counterpart TIA-1'%. Nam8 contacts the Yhcl (human UIC) 

C terminus, and human TIA-1 biochemically also interacts with human 
UIC’. Snu56 (blue), Prp39 (magenta), Prp42 (violet), and Hsh49 (light 
green) are shown as transparent ribbon models and other protein and U1 
snRNA elements were removed for clarity. 


B-complex’, in which the pre-mRNA branch-point sequence is base- 
paired with the U2 snRNA and the branch point adenosine is bulged 
out and accommodated in a pocket formed by the U2 SF3b subunits 
Hsh155 and Rds3. After we completed the A-complex structure, the 
cryo-EM structure of the free yeast U1 snRNP was reported”. This 
model is in good agreement with the U1 snRNP in our A-complex 
structure, but there are important differences!?. 

The first ten nucleotides of U1 snRNA are disordered in the free U1 
snRNP”’, but become ordered in our A-complex structure by pair- 
ing with the pre-mRNA 5’SS (Fig. 2a, b). Additional density appeared 
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adjacent to the U1-5’SS helix, into which we could build a newly 
ordered Yhcl peptide (human U1C) that contacts the 5’SS phosphate 
backbone (+5 and +6 positions, the “Yhc1-5’SS loop’) and a near- 
complete model of Luc7 (in the previous study Luc7 was attributed 
to what is now assigned as Snu71!”) (Extended Data Figs. 3a, c, 4a). 
Although Luc7 is disordered in the free U1 snRNB, it associates sta- 
bly with the U1-5’SS helix in the A-complex (Extended Data Fig. 4a), 
suggesting a mechanism for the selection of weak 5’/SS sequences!3. 
In our structure Luc7 is anchored by its N-terminal a-helix 1 to the 
Sm ring subunit SmE, and its C3H-type zinc finger 1 (ZnF1) domain 
binds where the 5’ exon emerges from the U1-5/SS helix, in excellent 
agreement with RNA-protein crosslinks! (Fig. 2b). The adjacent Luc7 
C,H)2-type ZnF2 contacts the U1-5’SS helix minor groove and the U1 
snRNA phosphate backbone (nucleotides U5-C8). This interaction 
mirrors that between the Yhcl ZnF domain and the 5’SS nucleotides 
+1 to +4 downstream of the 5’SS junction!? (Fig. 2b). Thus, Yhc1 
and Luc7 make no base-specific interactions with the U1-5’SS helix, 
and instead cradle the U1-5’SS helix phosphate backbone to stabilize 
5/SS binding. Consistent with the structure, weakening of any of these 
interactions can impair splicing and bypass the requirement for Prp28 
helicase activity*"°, 

The A-complex structure reveals structural insights into the func- 
tions of the human alternative splicing factors LUC7-like (LUC7L, yeast 
Luc7) and TIA-1 (yeast Nam8) (Extended Data Fig. 4c, d). Luc7 and 
its human homologues LUC7L1-3 are highly conserved, suggesting 
that the LUC7L N-terminal c-helix also anchors it to the SmE protein 
and that the invariant ZnF2 helix 08 similarly stabilizes the U1-5’/SS 
helix to promote the inclusion of weak alternative splice sites? (Fig. 2b, 
Extended Data Figs. 3c, 6a). The yeast U1 snRNP subunit Nam8 and its 
human homologue TIA-1 contain three RNA recognition motif (RRM) 
domains and a C-terminal Gln-rich extension (Extended Data Fig. 6b). 
Human TIA-1 binds to uridine-rich sequences downstream of the 5’SS 
predominantly through the RRM2 domain’””* to allow the use of weak 
5/SSs. The Nam8 RRM2 shows high sequence similarity to the TIA-1 
RRM2, including the nearly identical RNP1 and RNP2 motifs, indi- 
cating that Nam8 also binds uridine-rich sequences through its RRM2 
also (Extended Data Fig. 6b). In the A-complex structure the Nam8 
RRM3 and its C-terminal region bind in a cavity of the Prp39-Prp42 
heterodimer and contact the Yhcl C-terminal region near the U1-5’SS 
helix (Fig. 2c). From this location, Nam8 could project its mobile RRM2 
domain to bind uridine-rich intron sequences downstream of the 5’SS, 
consistent with crosslinking experiments’’, and thereby promote mei- 
otic pre-mRNA splicing’? (Fig. 2a, c). 

In the A-complex, the U1 snRNP binds to the U2 snRNP through 
two interfaces, A and B (Fig. 2a). In interface A, the N-terminal helices 
al-2 of the U1 protein Prp39 stably bind the U2 3’ domain subunit 
Leal (human U2A’) (Fig. 2a, Extended Data Fig. 5). The Prp39-Prp42 
heterodimer binds Yhcl to anchor the U2 snRNP 3’ domain to the 
U1 snRNP. Similar interactions were observed biochemically between 
the human alternative-splicing factor PRPF39 homodimer and UIC” 
(yeast Yhcl), suggesting that PRPF39 may contact the human U2 3’ 
domain in a similar manner, although it is not an obligate component 
of the human A-complex” (Fig. 2a). Different, non-overlapping Leal 
surfaces are used to interact with the NTC protein Syfl in the yeast 
C- and C*/P-complex conformations of the spliceosome”! (Extended 
Data Fig. 5c), suggesting that Leal aids in the repositioning of the U2 3’ 
domain in multiple stages of splicing. Interface B is transient and found 
only in a subset of cryo-EM images (Extended Data Figs. 2a, 5a, b). It 
involves weak interactions between the yeast-specific U1 snRNA stem 
loop 3-3 and the U2 SF3b Rsel subunit 3-propellers B and C (BPB and 
BPC) and the C terminus of U2 SF3a Prp9. The pre-mRNA 5/SS and 
branch point branching reactants are positioned approximately 150A 
apart in the A-complex, with 40 nucleotides of the UBC4 intron looped 
out in between (Fig. 2a, Extended Data Fig. le, f). The small interfaces 
between the U1 and U2 snRNPs orient the snRNPs relative to each 
other, and this may facilitate 5’SS transfer in the assembled spliceosome 
and the subsequent dissociation of the U1 snRNP, consistent with the 
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Fig. 3 | Spliceosome assembly and 5’SS transfer. a, One of the two 
alternative pre-B-complex models, suggesting that the U2 snRNP orients 
the U1 snRNP to deliver the pre-mRNA 5’SS to the U6 ACAGAGA stem. 
The model was obtained by superposing the yeast A- (from this study) and 
B-complex structures (RCSB Protein Data Bank code (PDB ID) 5NRL) 
and by modifying the locations of Brr2, U4 Sm ring, Sad1, and Prp28 to 
resemble a human-like pre-B-complex conformation on the basis of the 
biochemical data and the human U4/U6.U5 tri-snRNP structure (PDB ID 
3JCR) (see ‘Structural modelling’ in Methods). Colouring as in Fig. 1 


structural and biochemical data”*®. Although the precise U1-U2 snRNP 
interfaces may differ in the human A-complex, a key function of 
U1-U2 (alternative) splicing factors could be to ensure that U1 and the 
U1-5’SS helix are oriented correctly relative to the U2 snRNP. 

Before A-complex formation, the yeast Msl5-Mud2 heterodimer 
recognizes the branch point sequence through Msl5 and binds the U1 
snRNP subunit Prp40 (human PRPF40) in the E complex, looping 
out the intron between the 5’/SS and branch point sequences” 
(Extended Data Fig. 4e). Although Prp40 was not identified in the free 
U1 snRNP” or in our A-complex structure, Prp40 crosslinks to Luc7 
and Snu71” and unassigned cryo-EM density in the A-complex may 
indicate its peripheral location near Luc7 (Extended Data Figs. le, 4a, e). 
MsI5-Mud2 may then be destabilized by the Sub2 helicase, allowing the 
Prp5 helicase to remodel U2 snRNA for the stable association of the U2 
snRNP with the branch point sequence in the A-complex’. Prp5 was 
shown to physically interact with the U2 SF3b subunit Hsh155 HEAT 
repeats 1-6 and 9-12” and with U2 snRNA at and surrounding the 
branch point-interacting stem loop’. Thus, after Prp5 activity, Prp5 
needs to dissociate to fully expose the Hsh155 HEAT repeats 11-13 
together with the U2 snRNA 5’ end in the A-complex, to allow for the 
subsequent U4/U6.U5 tri-snRNP association to assemble the splice- 
osome’~? (Fig. 2a). 

The A-complex structure also provides new insights into forma- 
tion of the fully assembled pre-B-complex spliceosome, which requires 
integration of the tri-snRNP with the A-complex. The subsequent 
Prp28 helicase-mediated transfer of the 5’SS from U1 to U6 snRNA 
and destabilization of the U1 snRNP produces the B-complex splice- 
osome”*. We first modelled a fully assembled yeast spliceosome, by 
superimposing the U2 snRNP SF3b-containing domains of the yeast 
A-complex (from this study) and the yeast B-complex structure®. As 
in the B-complex structure®, the U2 snRNP would associate with 
tri-snRNP via the U2/U6 helix II and Prp3 (Extended Data Fig. 7). 
The modelling shows that the U1 snRNP would clash with large parts 
of the Brr2-containing ‘helicase’ domain (‘U1-B-complex’; Extended 
Data Figs. 7b, 8b), which may be relieved owing to their known flexi- 
bilities® (Extended Data Fig. 5a). However the known binding site for 
Prp28 at the U5 Prp8 N-terminal domain (Prp8¥) observed in human 
tri-snRNP”° would be sterically occluded by the pre-bound B-complex 
proteins”*°, We therefore considered an alternative model for the 
assembled yeast ‘pre-B-complex’ spliceosome, by combining the avail- 
able data from yeast and human systems®*”>”8 (Fig. 3a, Extended 
Data Figs. 7a, 8a). First, the isolated human” and yeast tri-snRNP?°”? 
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and a previously published work’. b, The pre-B-complex RNA network 
and the Prp28 helicase are shown as cartoons and are superimposed on 
transparent surfaces of the spliceosome proteins. Prp28 is positioned at 
the Prp$ N-terminal domain as in human tri-snRNP** and may clamp 
onto the pre-mRNA near the U1-5’SS helix to destabilize it and transfer 
the 5’SS from U1 snRNA to the U6 snRNA ACAGAGA stem (red arrow), 
which are separated by approximately 20 A in the pre-B model. The 
positions of proteins marked with asterisks are based on the human 
tri-snRNP structure (PDB ID 3JCR). 


structures differ in their protein composition and conformation, 
indicating that different complexes accumulate at steady-state. In the 
human tri-snRNP structure” the BRR2 helicase is held near SNU114 
by the SAD 1 protein and PRP28 is bound to the PRP8 N-terminal 
domain. In the yeast tri-snRNP?®”? and the yeast and human 
B-complex structures”*® Brr2 is repositioned and loaded onto its U4 
snRNA substrate and the B-complex proteins replace Prp28 at the 
Prp8N domain, ready for spliceosome activation. Second, in humans, 
an ATPase-deficient PRP28 helicase stalls spliceosome assembly at 
the pre-B-complex stage, before disruption of the U1-5’SS interac- 
tion*® and this complex comprises the U1 and U2 snRNPs, a loosely 
associated tri-snRNP, and SAD1”%. Third, in yeast, Sad1 is essential 
for splicing and is very transiently associated with the tri-snRNP”’. 
Given the high conservation of the major spliceosome components in 
yeast and humans, the yeast spliceosome may likewise assemble with 
a human-like tri-snRNP that contains Prp28, Sad1 and a repositioned 
Brr2 helicase”>*. On the basis of these assumptions, we modelled a 
yeast pre-B-complex spliceosome that comprises all five snRNPs with 
a combined molecular mass of approximately 3.1 megadalton and with 
only minor clashes (Fig. 3a, Extended Data Fig. 7a, b). Notably, this 
model indicates that the U2 snRNP positions the U1 snRNP to deliver 
the U1-5’SS helix to the exposed U6 ACAGAGA stem in tri-snRNP, 
only approximately 20 A away from where Prp28 is likely to mediate 
5/SS transfer, consistent with protein-RNA crosslinks” (Fig. 3b). This 
suggests that the subsequent repositioning of the Brr2 helicase onto 
the U4 snRNA, observed in the B complex structure”*,would coincide 
with the release of the U1 snRNP owing to a steric clash, rendering 
Brr2 competent for spliceosome activation only after successful 5’SS 
transfer (Extended Data Figs. 7b, 8a). The model thus indicates a new 
molecular checkpoint to couple 5’SS transfer with U1 snRNP release 
and formation of the B-complex (Extended Data Figs. 7b, 8a). 

In summary, the prespliceosome structure reveals how the U1 and 
U2 snRNPs recognize the two reactants of the branching reaction and 
associate together with the tri-snRNP into the fully assembled splice- 
osome. The results further suggest how the human alternative-splicing 
factors LUC7L and TIA-1 may influence splice-site selection. 
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Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


Prespliceosome preparation and purification. To obtain the prespliceosome 
A-complex for structural studies, we prepared yeast S. cerevisiae containing 
a genomic TAPS affinity tag on the U2 snRNP subunit Hsh155, essentially as 
described?". Yeast cells were grown in a 120-1 fermenter, and splicing extract was 
prepared using the liquid-nitrogen method, essentially as described”. Capped 
UBC4 pre-mRNA containing a point mutation (U > A) two nucleotides upstream 
of the branch point adenosine and three MS2 stem loops at the 3’ end was pro- 
duced by in vitro transcription?*?. The RNA product was labelled with Cy5 at its 
3’ end to monitor complex purification**. The pre-mRNA substrate was bound 
to the MS2-MBP fusion protein and added to an in vitro splicing reaction car- 
ried out for 90 min at 23°C, essentially as described**. The reaction mixture was 
then centrifuged through a 40% glycerol cushion in buffer A (20 mM HEPES 
(pH 7.9), 50mM KCI, 0.2mM EDTA, 1 mM dithiothreitol (DTT), 0.04% NP-40). 
The cushion was diluted with buffer A containing 1% glycerol, and applied to 
amylose resin (NEB) pre-washed with buffer B (20 mM HEPES (pH 7.9), 75 mM 
KCl, 5% glycerol, 0.2 mM EDTA, 1mM DTT, 0.03% NP-40). After 12 h incubation 
at 4°C, the resin was washed with buffer B and eluted in buffer B containing 50 mM 
KCland 12mM maltose. Fractions containing A-complex were pooled and applied 
to Strep-Tactin resin (GE Healthcare), pre-washed with buffer B, and incubated for 
4h at 4°C. The resin was washed with buffer B containing 2mM MgCl, and eluted 
with buffer B containing 50 mM KCl, 2.5 mM desthiobiotin, and 2mM MgCh. 
The A-complex fractions were pooled and crosslinked using 1.1 mM BS3 (Sigma) 
on ice for 1h, and subsequently quenched with 50 mM ammonium bicarbonate. 
The sample was concentrated to ~0.4mg ml”! and immediately used for EM 
sample preparation. Mass spectrometry (data not shown), indicated that homog- 
enous A-complex was purified, containing sub-stoichiometric amounts of Prp5 
(Extended Data Fig. 1b). The splicing assay in Extended Data Fig. 1a was carried 
out as for A-complex purification, but in a volume of 25,1] and in the absence of 
MS2-MBP fusion protein, and was visualized after 30 min of splicing at 23°C ona 
denaturing 14% polyacrylamide TBE gel with a Typhoon scanner (GE Healthcare). 
Electron microscopy. For cryo-EM analysis the A-complex sample was applied 
to R2/2 holey carbon grids (Quantifoil), precoated with a 5-7-nm homemade 
carbon film. Grids were glow-discharged for 20 s before deposition of 2.5 11 sample 
(~0.4mg ml~!), and subsequently blotted for 2-3.5s and vitrified by plunging into 
liquid ethane with a Vitrobot Mark III (FEI) operated at 4°C and 100% humidity. 
Cryo-EM data were acquired on three separate FEI Titan Krios microscopes (data- 
sets one to three) operated in EFTEM mode at 300 keV, each equipped with a K2 
Summit direct detector (Gatan) and a GIF Quantum energy filter (slit width of 
20eV, Gatan). Datasets one and three were recorded using ‘Krios 1’ and ‘Krios 2’ 
at the MRC-LMB, respectively, and dataset three using ‘Krios 2’ at the Astbury 
Biostructure Laboratory (University of Leeds). For dataset one, 5,935 movies were 
acquired using EPU (FEI) with a defocus range of -0.4|1m to -4.4jm at a nominal 
magnification of 105,000 (1.13 A pixel"!). The camera was operated in ‘counting’ 
mode with a total exposure time of 13s fractionated into 20 frames, a dose rate of 
4.25e pixel"! s~!, and a total dose of 43e~ A~? per movie. Dataset two was col- 
lected in the same manner, except that 727 movies were recorded using SerialEM*>, 
at a nominal magnification of 105,000 x (1.14A pixel~'), a total exposure time of 
8s fractionated into 20 frames, a dose rate of 4.33 e— pixel”! s anda total dose 
of 27e~ A~? per movie. Dataset three was collected with EPU (FEI) similar to 
dataset one, except that 2,745 movies were collected at a nominal magnification of 
130,000 (1.07 A pixel’), a total exposure time of 8s fractionated into 20 frames, 
a dose rate of 7.94e~ pixel~! s~! anda total dose of 56e~ A~? per movie. 
Image processing. Movies were aligned using MOTIONCOR2* with 5 x 5 patches 
and applying a theoretical dose-weighting model to individual frames. Contrast 
transfer function (CTF) parameters were estimated using Gctf*’”. Resolution is 
reported on the basis of the gold-standard Fourier shell correlation (FSC) (0.143 
criterion) as described** and B-factors were determined and applied automat- 
ically in RELION 2.1°*“°. Particles from dataset one were automatically picked 
using Gautomatch (K. Zhang) and screened manually, and were then extracted in 
RELION with a 5,602 pixel box size and pre-processed. Particles from datasets two 
and three were picked and pre-processed in the same way, and were then rescaled 
to the pixel size of dataset one (1.13 A pixel!) in RELION 2.1 by Fourier cropping 
during particle extraction with a 5,602 pixel box. For rescaling, we first calculated 
3D refinements in RELION 2.1 for each dataset (one to three) and performed 
real space correlation fits in UCSF Chimera to identify scaling factors for datasets 
two and three relative to dataset one. Because the absolute magnification values 
differed slightly for the different microscopes, we re-determined the CTF values for 
datasets two and three using the new pixel sizes with Gctf*’, and then re-extracted 
and rescaled the particles to the 5,602 pixel box. Combining datasets one to three 
yielded a total dataset of 406,272 particles that were used for subsequent processing. 
The first 22,319 particles from dataset one were used to generate an ab 
initio 3D reference for the A-complex using default parameters and three classes 
in cryoSPARC"! (Extended Data Fig. 2a). The complete dataset (one to three) was 


LETTER 


subjected to a ‘heterogeneous’ (multi-reference) refinement in cryoSPARC using 
default parameters and four classes: the ab initio A-complex reference and three 
‘junk references (Extended Data Fig. 2a; round 1). Class one contained 153,570 
particles (37.8%, percentage of particles form the full dataset) and was used for 
a 3D refinement in RELION 2.1 with a soft mask in the shape of the A-complex. 
This yielded a density (map A1) with an overall resolution of 4.9 A and a B-factor 
of —188 A”, comprising U1 snRNP and the U2 snRNP 3/ region (Extended Data 
Figs. le, d, 2, 9). To improve the U1 snRNP density, we prepared a soft mask envel- 
oping the U1 snRNP with the volume eraser in UCSF Chimera and RELION 
2.1340. This enabled the focused refinement of the U1 snRNP (map A2) from the 
same 153,570 particles to an overall resolution of 4.0A and a B-factor of —146 A? 
(Extended Data Figs. le, d, 2, 9). In the A-complex the U2 snRNP 5’ region is 
flexible relative to the U1 and the U2 3’ region (Extended Data Fig. 2). To position 
the U2 snRNP 5’ region in the A-complex, we used a soft mask surrounding the 
U2 5’ region and carried out 3D classification without image alignment with six 
classes (Extended Data Fig. 2a; round 2). This revealed a class with defined U2 5’ 
region from 19,937 particles (4.9%) that could be refined to an overall resolution of 
10.4A (Extended Data Figs. 2, 9). Local resolution was estimated using ResMap** 
(Extended Data Fig. 2d, e). 

Structural modelling. We prepared a composite model of the A-complex by com- 
bining the A1-3 densities (Extended Data Fig. le, f). Model building was carried 
out in COOT™. The U1 snRNP coordinates were refined into the sharpened A2 
density in PHENIX* using the phenix.real_space_refine routine, and applying 
secondary structure, rotamer, nucleic acid and metal ion restraints. Homology 
models for yeast Yhcl, Snp1, and Mud1 were generated using MODELLER“ from 
the human U1 snRNP crystal structures!” (PDB ID 4PJO, 4PKD) and were fitted 
and manually adjusted in the A2 map. The yeast B-complex U5 Sm ring model 
was used as the initial model for the U1 Sm ring, and was manually adjusted in 
the A2 density. Initial models for Prp39 and Prp42 were generated by I-TASSER“” 
and were subsequently adjusted and extended manually. The Prp39 N-terminal 
residues 47-339 were modelled as poly-alanine owing to a lower local resolution of 
~5-6 A (Extended Data Figs. 2d, e, 3c). Snu56, the Yhcl C terminus, the Snu71 N 
terminus were modelled de novo; Yhc1 residues 48-82 and 135-142 were modelled 
as poly-alanine. To build the Luc7 model a C3H-type ZnF (from PDB ID 1RGO) 
for ZnF1 and a C,H>-type ZnF (from Yhc1) for ZnF2 were used to guide mod- 
elling in the A2 density, with a local resolution of 4-5 A (Extended Data Fig. 3c). 
The helices connecting Luc7 ZnF1 and ZnF2 (a5-7) were modelled as poly- 
alanine, and were assigned on the basis of density connectivity. The U1 snRNP 
protein model is in excellent agreement with biochemical and protein crosslinking 
results!*. The Ul snRNA model was generated on the basis of similarity to the U1 
snRNA in the human U1 snRNP crystal structures (PDB ID 3CW1, 4PJO, 4PKD) 
and according to the yeast U1 snRNA secondary structure prediction®. All base- 
pairing U1 snRNA regions (helix H; SL1; SL2-1 and -2; SL3-1, -2, -3, -4, -5 and -6), 
except for the SL3-7 and the tip of SL3-3, were modelled (Extended Data Fig. 3f, g). 
The human SL1 loop (PDB ID 4PKD) was rigid-body-fitted together with the 
homology model of the yeast Snp1 (described above), and the human U1 snRNA 
sequence was replaced with that of yeast. The loops connecting SL2-1 to SL2-2 as 
well as SL3-3 to SL3-4 and SL3-4 to SL3-5 and the tips of SL2-2, SL3-3, -4 and -5 
were not built, owing to a lower local resolution (~4.5 A). The location of a region 
of U1 snRNA SL3-7 was modelled as a phosphate backbone only and may corre- 
spond to the sequence surrounding residues 378-391 and 428-440. The U1-5’SS 
was modelled de novo, and the UBC4 pre-mRNA contained 12 nucleotides, ten 
from the intron (+1 to +10) and two from the 5’ exon (—1 to —2). 

The U2 snRNP 3’ region (U2 3’ domain and SF3a subcomplexes) from the yeast 
B-complex structure (PDB ID 5NRL) were fitted into the Al density using UCSF 
Chimera”, and the positions of Leal, Msll and U2 snRNA residues 139-1169 
were adjusted as a rigid body in COOT™. The U2 snRNP 5’ region from the yeast 
B-complex structure (PDB ID 5NRL) was fitted into the A3 density in UCSF 
Chimera. This provided an excellent fit, suggesting that the U2 5’ region structure 
is not changed substantially from that observed in the yeast B-complex®. To gen- 
erate the complete A-complex model, the refined U1 snRNP model and the U2 
snRNP 3’ region were fitted into the A3 density in UCSF Chimera, together with 
the fitted U2 snRNP 5’ region. The final model comprises 34 proteins, U1 and U2 
snRNAs, and the pre-mRNA substrate. 

To generate the alternative pre-B-complex model shown in Fig. 3, we modified 
and combined structural models using COOT“, on the basis of structural and 
biochemical data from yeast and human systems*>8, We first superimposed 
our A-complex structure on the yeast B-complex structure® using the U2 SF3b- 
containing domain. The free human tri-snRNP structure (PDB ID 3JCR), which 
may resemble the pre-B conformation””°, was used to model the yeast tri-sn RNP 
in the pre-B-complex conformation. We first removed the B-complex proteins 
from the yeast B-complex structure, because these are absent in the purified human 
pre-B-complex”*. Human pre-B instead contained the PRP28 helicase and SAD1, 
and we therefore placed crystal structures of the yeast Prp28 helicase’? (PDB ID 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


4W7S) and yeast Sad1°° (PDB ID 4MSX) in their human tri-snRNP locations”®. We 
then positioned the U4 Sm ring and Brr2 as in the human tri-snRNP structure, in 
which the Brr2 PWI domain makes a conserved contact with Sad1°". We removed 
a Snu6é6 peptide bound to Brr2 from the model, because its binding at this site is 
uncertain in the pre-B-complex conformation. Several minor differences remain 
between the free human tri-snRNP structure” and the pre-B-complex model, and 
these were not modelled. The final pre-B model contained only minor clashes, and 
one observed clash between the highly flexible Prp28 RecA-2 lobe” and the flexible 
U6 snRNA 5’ stem loop*”6 could be resolved by a minor repositioning of either 
domain. The final pre-B model comprises 66 proteins, five snRNAs, the pre-mRNA 
substrate, and has a combined molecular mass of ~3.1 MDa. 

Figures were generated with PyMol (https://www.pymol.org) and UCSF 
Chimera. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 
Data availability. Three-dimensional cryo-EM density maps Al, A2 and A3 have 
been deposited in the Electron Microscopy Data Bank under the accession num- 
bers EMD-4363, EMD-4364 and EMD-4365, respectively. The coordinate file of 
the A-complex has been deposited in the Protein Data Bank under the accession 
number 6G90. 
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Extended Data Fig. 1 | Biochemical characterization and cryo-EM of the 
prespliceosome A-complex. a, Mutation of the UBC4 pre-mRNA branch 
point sequence (UACUAAC to UACAAAC, in which A is the branch-point 
adenosine and A is the mutated nucleotide) stalls splicing before the first 
step, as described’. Splicing reactions were carried out for 30 min at 23°C 
in yeast extract using wild-type (lane one) or mutant (U > A, lane two) 
pre-mRNA. This experiment was performed three times. The asterisk 
indicates a degradation product. For gel source data see Supplementary 
Fig. la. b, Protein analysis of purified A-complex (SDS-PAGE stained with 
Coomassie blue). The U2-associated Prp5 protein is sub-stoichiometric 
and not observed in the A-complex structure. The purification and 
analysis of protein compositions were performed at least five times 

with similar results. For gel source data see Supplementary Fig. 1b. 

c, Cryo-EM micrograph of the A-complex. Scale bar, 100 nm. d, 2D class 


putative 
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ARE? 


U1 snRNP 


U1 snRNP 


averages of the A-complex were determined in RELION 2.15”, and 
reveal a bipartite architecture, comprising the U1 snRNP and the U2 
snRNP 3/ and 5’ regions, respectively. e, Composite cryo-EM density of 
the A-complex shown in two orthogonal views (compare to Fig. 1). The 
respective densities used for modelling the U1 snRNP (A2, grey), the U2 
3’ region (A1, cyan), and the U2 5’ region (A3, green) are coloured and 
superimposed on a transparent outline of the full A3 map (Methods). The 
overall resolution of each map as well as the percentage from the cleaned 
dataset of 153,556 particles are shown in parentheses. Non-modelled 
regions are indicated and putatively assigned. f. Composite cryo-EM 
density with the final A-complex model superimposed in a cartoon 
representation. The path of 40 nucleotides of the disordered UBC4 
pre-mRNA intron are indicated. A-complex components are coloured as in 
Fig. 1. Views as in e. 
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Extended Data Fig. 2 | Cryo-EM image classification and refinement. 
a, Image processing workflow for analysis of the A-complex cryo-EM 
dataset (see ‘Image processing’ in Methods). To visualize differences 
between the reconstructions, the U1 snRNP (grey), U2 3’ (cyan) and U2 
5’ regions (green) are coloured. For each round of three-dimensional 
classification, the percentage of the data and the type of soft-edged mask 
are indicated. The type of mask and overall resolution are indicated for 
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Front view 


Right side view 


each 3D refinement (blue box). b, Orientation distribution plots for 

all particles that contribute to the respective Al, A2, and A3 cryo-EM 
reconstructions. c, Gold-standard Fourier shell correlation (FSC = 0.143) 
of the respective Al, A2 and A3 cryo-EM reconstructions. d, Two views 
of the composite A-complex cryo-EM density (maps Al, A2 and A3) 
coloured by local resolution as determined by ResMap®. e, As panel d, but 
for a central slice through the composite A-complex cryo-EM map. 
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Extended Data Fig. 3 | See next page for caption. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Fig. 3 | Details of the U1 snRNP. a, U1 snRNP structure 
with subunits coloured as in Fig. 1, except for Nam8 (orange), Snu56 

(light blue), Snu71 (blue), Luc7 (dark purple), Mud1 (red) and the U1 
snRNA (various). The pre-mRNA nucleotides are labelled relative to the 
first nucleotide (+1) of the intron. The Nam8 RRM1 and RRM2 domains 
are flexible and project downstream of the 5’SS. The protein attributed to 
Luc7 in the free U1 snRNP structure’? was re-assigned to Snu71. C-term, 
C terminus; N-term, N terminus; SL, stem loop. In the structure we do not 
observe any evidence that the C-terminal tails of SmB, SmD1, and SmD3 
interact with the 5’SS, consistent with their absence in the human 5/SS- 
minimal U1 snRNP crystal structure’®. b, Representative regions of the 
sharpened U1 snRNP density determined at 4 A resolution (map A2) are 
superimposed on the refined coordinate model. The density reveals side- 
chain details, and here segments from the Prp42 N terminus (TPR repeat 1), 
the Sm ring subunit SmB, and the Snu56 a-helical domain are shown. 

c, The A2 cryo-EM density is shown superimposed on the coordinate 
models of a selection of U1 snRNP proteins: Luc7, Snu71, Yhcl and Prp39. 
In the structure most of Snu71 is disordered, except for a small N-terminal 
domain (residues 2-43) that binds between the Prp42 N terminus and the 
Snu56 KH-like fold, consistent with protein crosslinking”. Functional 


regions and disordered domains are indicated. d, The U1 snRNA-pre-mRNA 
5! splice site (U1-5’SS) model is superimposed on its cryo-EM density 
(map A2). A secondary structure diagram of the U1-5’SS interaction is 
shown underneath the model. The register of the U1-5’SS is shifted by one 
nucleotide with respect to ULC (Yhc1) compared to the minimal human 
5/SS-U1 snRNP crystal structure, owing to an additional nucleotide in the 
yeast U1 snRNA"? (U11). Lines indicate Watson-Crick base pairs and dots 
indicate pseudouridine (1))-containing base pairs. e, The Prp39-Prp42 
heterodimer is coloured to indicate each of their respective TPR repeats. 

f, Cryo-EM density of U1 snRNA from maps A2 (dark grey) and A3 (light 
grey) without (top) and with the superimposed coordinate model of yeast 
U1 snRNA (bottom). The model is labelled and coloured according to 
functional regions of U1 snRNA (5’ end, pink; H helix, cyan; SL1, dark 
blue; SL2-1, green; SL3-1, light blue; SL2-2 and SL3-2 to -7, grey; 3’end 
and Sm site, yellow). g, Secondary-structure diagram of U1 snRNA. 

Bold letters indicate residues included in the model, lines indicate 
Watson-Crick base pairs, and dots G-U wobble and pseudouridine- 
containing base pairs. Compare to e. The conserved U1 snRNA ‘core’ is 
outlined with a grey box. The region of the putative phosphate backbone 
model of part of the U1 SL3-7 region is indicated with a grey box. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Comparisons of yeast and human U1 snRNPs 
and implications for alternative splicing. a, Formation of the U1-5/SS 
helix induces stable binding of Luc7. In the absence of a pre-mRNA 5’SS 
in the free U1 snRNP density (left, EMD-8622), Luc7 and the U1 5’ end 
are disordered. Upon 5/SS recognition at the U1 5’ end (centre, map A2), 
Luc7 becomes ordered and stabilizes the U1-5’SS interaction, suggesting 
a mechanism for the selection of weak 5/SS sequences. The free U1 snRNP 
and the 5'SS-bound (map A2) cryo-EM densities are superimposed on 
the right. Although the long a-helical density next to Luc7 cannot be 
assigned with confidence, protein-protein crosslinking data’? and protein 
secondary structure prediction are consistent with the presence of either 
Prp40 or Snu71. On the basis of additional biochemical data on the 
interaction between the «-helical Prp40 FF1 domain and Luc7 ZnF2°’, we 
would speculate that the Prp40 FF1 domain is the most likely candidate 
for this density. b, Comparison of the yeast U1 snRNP ‘core’ with the 
human U1 snRNP crystal structure (PDB ID 3CW1). Protein and RNA 
(top) and RNA only (bottom) are shown side by side (left and centre) 

and superimposed by a global alignment in PyMOL (right). Coloured 

as in Extended Data Fig. 3a. c, The yeast U1 snRNP model suggests 
regulatory mechanisms for human alternative splicing factors. The human 
homologues of the peripheral yeast U1 proteins may function through 
stabilization of the U1-5/SS interaction (region 1), of the U1-U2 3’ region 


interface (region 2), or the U1-U2 5’ interface (region 3). The yeast U1 
snRNP ‘core’ is shown superimposed on a surface representation of the 
U1 snRNP model (top), compared with the similarly coloured human U1 
snRNP (below). Interaction sites with the U2 snRNP are labelled (top). 

d, The location of yeast U1 snRNP components with homology to human 
splicing factors are indicated in the U1 snRNP structure. The Prp39- 
Prp42 heterodimer (human PRPF39 homodimer), Nam8'* (human TIA-1 
and TIA-R), Luc7°? (human LUC7L1-3), and the Yhcl C terminus (human 
UIC) have clear counterparts in the human system. The yeast-specific U1 
snRNA insertions may be replaced in the human system by alternative 
splicing factors that modulate interactions with the U2 5’ region. 

e, Model of the yeast E complex on the basis of the U1 snRNP structure 
and biochemical data”. Luc7, Snu71 and Prp40 form a heterotrimer in 
vitro™, and their interacting regions may be located near unassigned 
density (compare to Extended Data Fig. 1e) at the tip of an unassigned 
40-residue a-helix next to Luc7 ZnF2. This helix is likely to belong to the 
U1 subunit Snu71 or Prp40, consistent with protein crosslinking'* and 
protein secondary-structure prediction. Prp40 could then bind the yeast 
branch point-binding protein (BBP, human SF1), which in turn interacts 
with Mud2 (human U2AF65) to tether the pre-mRNA branch-point 
sequence in the E complex”. 
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Extended Data Fig. 5 | Conformational flexibility of the U2 snRNP. 

a, Two defined positions of the U1 snRNP-U2 3’ region could be identified 
relative to the U2 5’ region. A-complex models were fitted into class 

two and four from round two of the 3D image classification (compare 
Extended Data Fig. 2a). The classes are aligned via their U2 5’ region, 
illustrating their relative flexibility. b, Cartoon schematic of observed 
positions of the U2 3’ region relative to the U2 5’ region in the A-complex 
(left), B-complex® (centre), and activated B-complex (B**) (right, 
modelled from previously published work™). Although in the B-complex 
the U2 3’ region is free, in the A- and B**'-complexes the position of the 
U2 3’ region is influenced by interactions with Prp39 as well as Syfl and 
Clf1, respectively. c, The U2 snRNP subunit Leal (human U2A’) aids 

to position the U2 snRNP 3’ domain in different spliceosome states. In 
our A-complex structure, the Prp39 TPR repeat T1 contacts the helical 

C terminus of Leal. In the yeast C-complex structure, the non-modelled 


density for the Syf1 N terminus binds a neighbouring but non-overlapping 
surface of Leal (PDB ID 5LJ5). In the C*/P-complex® (PDB ID 6EXN), 
the Syfl N terminus binds yet another Leal surface and the U2 3’ domain 
is repositioned relative to its C-complex location. Together, this suggests 
that the Leal provides multiple interfaces that can be used to position 

the U2 3’ domain in different spliceosomal complexes. d, Fit of the U2 3’ 
region coordinate model to the Al cryo-EM density. The dashed black 
lineseparates the U2 3’ domain (Sm ring, Msl1 and Leal subunits and U2 
snRNA, left) and the SF3a subcomplex (Prp9, Prp11 and Prp21, right). 
Two orthogonal views are shown (Supplementary Video 2). e, Fit of the 
U2 5! region coordinate model to the A3 cryo-EM density. A density 
consistent with the U2 snRNA stem IIa/b and the branch helix is observed. 
Two density thresholds are shown side by side (left, 0.0163; right, 0.0121), 
and orthogonal views are shown underneath (Supplementary Video 2). 
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Extended Data Fig. 6 | Luc7 and Nam8 sequence alignments. a, The (purple) or PSIPRED*® secondary structure prediction (grey). Modelled 
Luc7 (human LUC7-like) amino-acid sequence alignment comparing regions (dashed line) and the Zn-coordinating residues of ZnF1 and ZnF2 
S. cerevisiae, Kluyveromyces lactis, Schizosaccharomyces pombe, Danio (asterisks) are indicated. Invariant or conserved residues are highlighted 
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generated with Clustal Omega and visualized with ESPript 3°°°”. For the (human TIA-1) comparing S. cerevisiae, K. lactis, S. pombe, Drosophila 
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Extended Data Fig. 7 | Details of the pre-B-complex model. 

a, Multiple views of the pre-B-complex model, generated by combining 
functional and structural data from yeast and human systems®”>, The 
mobility of the U1 snRNP relative to the U2 snRNP in the A-complex 
(this study) as well as of the U2 snRNP relative to tri-snRNP in the 
B-complex structure’ are indicated (left). The pre-B model contained only 
minor clashes, and a clash between the highly flexible Prp28 C-terminal 
RecA-2 lobe (from the human tri-snRNP”») and the highly flexible U6 
snRNA 5’ stem loop (from the yeast B-complex*) may be resolved by 
small movements of either domain. b, Structural comparisons of the yeast 
pre-B model (from this study) and the yeast B-complex structure (PDB 
ID 5NRL') suggest the existence of a molecular checkpoint to couple 5/SS 
transfer to U1 snRNP release and formation of the activation-competent 
B-complex. In the pre-B model (left) Sad1 tethers Brr2 through its 


‘Helicase’ 
domain YG 


B complex (PDB 5NRL) 


B complex 
proteins 


Brr2 


mt 
B complex 7 


proteins 


interaction with the conserved Brr2 PWI domain®’, and the U1 snRNP 
and its U1-5’SS helix are positioned near the U6 ACAGAGA region and 
the helicase Prp28. Subsequent to Prp28-mediated 5’SS transfer, Brr2 

is repositioned onto its U4 snRNA substrate, guided by the B-complex- 
specific proteins (right). In this conformation the Brr2 helicase and its 
associated factors would clash with the U1 snRNP, consistent with U1 
snRNP destabilization and release yeast and human B-complexes””*. 
Brr2 is now ready to initiate spliceosome activation and formation of 
the active site in the B*'-complex. Regions that are changed between 
pre-B- and B-complex models (black outline) and the clash between the 
Brr2-containing ‘helicase’ domain and the U1 snRNP in B-complex 
(red X) are indicated. The lower right panel would conform to the 
alternative ‘U1-B-complex’ model. 
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Extended Data Fig. 8 | Model for early splicing events. a, Cartoon 
schematic of proposed early splicing events, detailing (i) assembly of the 
pre-B-complex spliceosome from the A-complex and the U4/U6.U5 tri- 
snRNP and (ii) the subsequent conversion to the pre-catalytic B-complex 
spliceosome. In the pre-B model the mobile U1 snRNP is next to Prp28, 
which is bound at the Prp8N domain. To initiate 5’SS transfer, Prp28 
could clamp the pre-mRNA at, or next to, the U1-5’SS helix to destabilize 
it and to hand over the 5/SS to the U6 ACAGAGA region of tri-snRNP, 
consistent with protein-RNA crosslinks*”. Transfer of the 5/SS may 
induce the binding of the B-complex proteins to replace Prp28 at the 
Prp8% domain and induce the large movement of Brr2 to its B-complex 
location on U4 snRNA. The U1 snRNP, now loosely tethered to U2, may 
dissociate from the B-complex owing to the steric clash with the Brr2- 
containing ‘helicase’ domain’ (Extended Data Fig. 7b). Consistent with 
this, the human pre-B-complex converts to a B-complex-like state in 

the presence of a 5/SS oligonucleotide, which coincides with U1 snRNP 
release**. This model can explain how Brr2 is kept inactive to prevent 
premature U4/U6 duplex unwinding”®. The model thereby implies the 
existence of a molecular checkpoint, coupling 5/SS transfer from U1 to U6 
snRNA with Brr2 helicase repositioning and U1 snRNP release to generate 
the activation-competent B-complex spliceosome. b, Cartoon schematic 
of an alternative model for spliceosome assembly and 5’SS transfer that 
relies only on the yeast A-complex (from this work), tri-snRNP?°”? 

and B-complex structures®. In this model the tri-snRNP that associates 
with the A-complex already contains the Brr2 helicase bound to the U4 
snRNA substrate and the yeast B-complex proteins at the Prp8 N-terminal 
domain. The tri-snRNP then binds the A-complex (transition I, 
‘Assembly’), requiring a substantial readjustment to avoid a steric clash 

of the Brr2-containing ‘helicase’ domain and the U1 snRNP (‘U1-B- 
complex’). The Prp28 helicase is then recruited to the U1 snRNP directly 
as the Prp28-binding site on the Prp8 N-terminal domain in human tri- 
snRNP is occupied by B-complex proteins”®. Prp28 then disrupts the 
U1-5’SS helix, leading to 5/SS transfer (transition II, “Transfer’). Similar 
to the ‘pre-B-complex’ assembly model in a, the U1 snRNP, now freed 
from the 5’SS, may then be released owing to a steric clash with the 
Brr2-containing ‘helicase’ domain. This model does not require Sad1. 
Compare to a. 
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a Cryo-EM data collection and refinement statistics of the A complex structure 
Al (U2 3' region) A2 (U1 snRNP) A3 (U2 5' region) 
Data collection 
Particles 153,556 153,556 19,937 
Pixel Size (A) 1,13 1.13 1.13 
Defocus range (um) —0.4 to -4.4 —0.4 to-4.4 —0.4 to 4.4 
Voltage (kV) 300 300 300 
Electron dose (e A”) 27-56 27-56 27-56 
Reconstruction (RELION) 
Accuracy of rotations (°) 1.21 0.95 1.70 
Accuracy of translations (pixel) 1.47 0.92 1.91 
Resolution (A) 49 4.0 10.4 
Map sharpening B-factor (A*) -188 -146 0 
Model composition 
Non-hydrogen atoms 13,333 28,244 23,788 
Protein residues 1,408 2,803 2,765 
RNA bases 91 338 78 
Refinement (PHENIX) 
Map CC (around atoms) 0.738 
Rms deviations 
Bond lengths (A) 0.016 
Bond angles (°) 2.02 
Validation 
Molprobity score 1.98 
All-atom clashscore 8.06 
Rotamer outliers (%) 0.34 
C-beta deviations 2 
Ramachandran plot 
Outliers (%) 0.5 
Allowed (%) 9.21 
Favoured (%) 90.29 
RNA validation 
Correct sugar puckers (%) 97.6 
Good backbone conformations (%) 77.2 
Data Deposition 
EMDB ID EMD-4363 EMD-4364 EMD-4365 
PDB ID for the complete model 6G90 
b FSC between map A2 and the refined A complex U1 snNRP coordinate model 
5 
3 
3 
E 
c 
8 
2 
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.44 
Resolution (1/A) 
Extended Data Fig. 9 | Data collection, refinement statistics and snRNP 3’ and 5’ regions, respectively. b. FSC between the A2 cryo-EM 
validation. a, Cryo-EM data collection and refinement statistics of the density and the refined A-complex U1 snRNP coordinate model. 


A-complex structure. Maps Al and A3 were used to position the U2 
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Extended Data Table 1 | Summary of the components modelled into the A-complex cryo-EM densities 


Sub-complexes 
Mud1 


Snp1 
Yhe1 
Prp39 


Prp42 

Nams 
U1 snRNP 

Snu56 

Luc7 


Snu71 


SmB 
SmD3 
SmD1 
SmD2 

SmE 

SmF 

SmG 

U1 snRNA 


Unknown 
Msl1 
Leat 
SmB 

SmD3 
SmD1 
SmD2 
SmE 
SmF 
SmG 
Hsh155 
Rse1 


U2 snRNP 
Cus1 


Hsh49 


Rds3 
Ysf3 
Prp9 


Prp11 


Prp21 
U2 snRNA 


UBC4 U/A pre-mRNA 


Protein/RNA 


Total residues 


298 


300 
231 
629 


544 
523 


492 
261 


620 


196 
101 
146 
110 
96 
86 
it 
568 


111 
238 
196 
101 
146 
110 
96 
86 
77 
971 
1361 


436 
213 


107 
85 
530 


266 


280 


1175 


135 


Proteins and RNA included in the model 


34.4 


34.4 
27.0 
74.8 


65.1 
57.0 


56.5 
30.2 


71.4 


22.4 
11.2 
16.3 
12.9 
9.7 
10.4 
8.5 
182.3 


12.8 
27a. 
22.4 
11.2 
16.3 
12.9 
OT 
10.4 
8.5 
110.0 
153.8 


50.3 
24.5 


12.3 
10.0 
63.0 


29.9 


33.1 


363.8 


40.6 


M.W. (kDa) 


Modelled residues 


17-42; 62-81; 84-94 
97-123; 134-148 
5-55; 58-88; 94-204 
2-59; 67-142; 153-195 
47-63; 66-85; 88-102; 108-119 
124-136; 139-154; 160-172; 
177-190; 193-208; 217-236; 
250-266; 271-275; 276-286; 
289-304; 307-321; 325-382; 
388-553; 561-626 
2-542 
292-400; 404-425; 434-449 
491-497; 501-521 
45-104; 109-170; 185-294 
5-20; 39-59; 67-84; 91-120 
126-138; 175-187; 195-241 
2-43 
2-63; 73-131 
3-95 
1-73; 78-119 
8-108 
8-63; 73-93 
12-84 
2-77 
1-61; 67-95; 103-112; 115-144; 


152-173; 181-202; 236-258; 260-264; 
270-275; 280-287; 295-325; 378-394; 


424-440; 516-532; 538-564 
1-56 
28-111 
1-170 
12-54; 76-102 
4-85 
1-48; 78-101 
17-108 
10-63; 71-93 
12-84 
2-76 
132-149; 157-971 
53-305; 323-571; 581-784; 
814-890; 918-1265; 1292-1361 
125-213; 239-353; 361-376 
9-86; 106-144; 
147-185; 189-203 
2-104 
2-84 
1-97; 112-378; 
407-478; 503-528 
34-47; 51-105; 
115-136; 149-253 
89-206; 220-228 
3-13; 30-73; 79-86; 108-122; 
139-150; 1089-1109; 1115-1130; 
1138-1154; 1159-1169 


-1-10; 51-53; 57-79 


Modelling 
template 
(PDB ID) 


4PKD 


4PJO, 4PKD 
4PJO, 4PKD 


5NRL 
5NRL 
5NRL 
SNRL 
5NRL 
SNRL 
SNRL 
4PJO, 4PKD 


5NRL 
5NRL 
5NRL 
SNRL 
5NRL 
5NRL 
SNRL 
5NRL 
SNRL 
SNRL 
5NRL 


SNRL 
5NRL 


SNRL 
5NRL 
5NRL 


5NRL 
SNRL 


5NRL 


S5NRL, 4PJO 


Modelling 
Docked 


Docked and rebuilt 


Docked, rebuilt, de novo 
de novo 


de novo 


de novo 


de novo 


de novo 


de novo 


Docked and adjusted 
Docked and adjusted 
Docked and adjusted 
Docked and adjusted 
Docked and adjusted 
Docked and adjusted 
Docked and adjusted 
Docked and de novo 


de novo 
Docked 
Docked 
Docked 
Docked 
Docked 
Docked 
Docked 
Docked 
Docked 
Docked 
Docked 


Docked 
Docked 


Docked 
Docked 
Docked 


Docked 
Docked 


Docked 


Docked and rebuilt 
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A 


[e) 


NI® 


Human name 


UIA 


U1-70K 
U1C 
PRPF39 


PRPF39 
TIA-1 


LUC7L 


RBM25 
SmB 
SmD3 
SmD1 
SmD2 
SmE 
SmF 
SmG 


U2-B" 
U2-A' 
SmB 
SmD3 
SmD1 
SmD2 
SmE 
SmF 
SmG 
SF3B1 
SF3B3 


SF3B2 
SF3B4 


SF3B14b 
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PtdIns(4,5)P» stabilizes active states of GPCRs and 
enhances selectivity of G-protein coupling 


Hsin-Yung Yen!*, Kin Kuan Hoi, Idlir Liko'?, George Hedger*, Michael R. Horrell?, Wanling Song’, Di Wu, Philipp Heine‘, 
Tony Warne’, Yang Lee*, Byron Carpenter®*®, Andreas Pliickthun‘, Christopher G. Tate®, Mark S. P. Sansom** 


& Carol V. Robinson!* 


G-protein-coupled receptors (GPCRs) are involved in many 
physiological processes and are therefore key drug targets. 
Although detailed structural information is available for GPCRs, 
the effects of lipids on the receptors, and on downstream coupling of 
GPCRs to G proteins are largely unknown. Here we use native mass 
spectrometry to identify endogenous lipids bound to three class A 
GPCRs. We observed preferential binding of phosphatidylinositol- 
4,5-bisphosphate (PtdIns(4,5)P2) over related lipids and confirm 
that the intracellular surface of the receptors contain hotspots for 
PtdIns(4,5)P» binding. Endogenous lipids were also observed bound 
directly to the trimeric Ga, protein complex of the adenosine A24 
receptor (A2,R) in the gas phase. Using engineered Ga subunits 
(mini-Ga,, mini-Go; and mini-Ga,2)’, we demonstrate that the 
complex of mini-Ga, with the 8; adrenergic receptor (8, AR) 
is stabilized by the binding of two PtdIns(4,5)P, molecules. By 
contrast, PtdIns(4,5)P,. does not stabilize coupling between 8, AR 
and other Ga subunits (mini-Go; or mini-Go2) or a high-affinity 
nanobody. Other endogenous lipids that bind to these receptors have 
no effect on coupling, highlighting the specificity of PtdIns(4,5) 
P,. Calculations of potential of mean force and increased GTP 
turnover by the activated neurotensin receptor when coupled to 
trimeric Ga;B-y complex in the presence of PtdIns(4,5)P, provide 
further evidence for a specific effect of PtdIns(4,5)P2 on coupling. 
We identify key residues on cognate Ga subunits through which 
PtdIns(4,5)P2 forms bridging interactions with basic residues on 
class A GPCRs. These modulating effects of lipids on receptors 
suggest consequences for understanding function, G-protein 
selectivity and drug targeting of class A GPCRs. 

The emerging view from biophysical studies of GPCRs is that they 
exist as ensembles of discrete conformations that can be influenced 
by ligands, regulatory proteins, pH, ions and, potentially, lipid mol- 
ecules’. The complex roles of these conformational ensembles in 
signalling pathways are further compounded by the combinatorial 
effects of the multiple distinct heterotrimeric complexes formed 
from 21 Ga, 6 G6 and 12 Gy subunits. Investigating the relationship 
between GPCRs, small molecule modulators and numerous binding 
partners is therefore challenging, owing to the difficulty of observing 
the complexity of these interactions directly. A previous study charac- 
terized interactions of lipids with the 2 adrenergic receptor (82AR) in 
high-density lipoparticles* to which phospholipids were added exog- 
enously, but did not address the selectivity and effects of different 
phosphatidylinositol (PI) phosphate lipids on coupling with down- 
stream effectors. In this study, we develop and apply high-resolution 
native mass spectrometry to interrogate endogenous lipid—receptor 
interactions”® of three class A GPCRs: the (3; adrenergic receptor 
(8,AR), the adenosine A>, receptor (A24R), and neurotensin recep- 
tor 1 (NTSR1). We reveal effects of PtdIns(4,5)P> that stabilize these 
receptors in active states, increase GTPase activity and enhance selec- 
tivity of coupling to G proteins. 


First, we considered the endogenous lipids that bind directly to 8; AR 
and the stabilized NTSR1(HTGH4-AIC3B)’, which were expressed 
in and purified from insect cells and Escherichia coli, respectively. 
Peaks corresponding to lipid adducts were observed for 38, AR and for 
NTSRI (Fig. 1a and Extended Data Fig. 1a). Collisional dissociation 
of protein-lipid complexes allowed us to identify two major classes of 
lipids bound to 8, AR, the phosphatidylserines (PS) (34:2 and 36:2) and 
PI phosphates (42:5), as well as phosphatidic acid (PA) (36:2), which 
bound to NTSR1 (Extended Data Fig. 1b, cand Extended Data Table 1). 
To investigate this selectivity, we incubated NTSR1 with PA and other 
anionic lipids (PS and PI), a zwitterionic lipid (phosphatidylcholine 
(PC)), and a neutral lipid (diacylglycerol (DAG)). Analysis of the result- 
ing native mass spectra show that NTSRI1 interacts preferentially with 
PA, PS and PI (Extended Data Fig. 2a—-e). We did not observe apparent 
binding of phosphatidylglycerol (PG) to NTSR1, although PG has been 
reported to increase G-protein activation of NTSR1 in a nanodisc®. 
It is possible that PG affects the local net charge at the receptor-lipid 
interface. Similarly, 3; AR, when incubated with detergent-solubilised 
PS (16:0-18:1) or phosphatidylinositol-4-phosphate (PtdIns(4)P) 
(18:1-18:1), showed higher affinity towards PtdIns(4)P than to PS 
(Fig. la and Extended Data Fig. 2f, g). 

To probe the selectivity of different PI derivatives we incubated 8,AR 
with equimolar ratios of PI, PtdIns(4)P, phosphatidylinositol-4,5- 
bisphosphate (PtdIns(4,5)P2) and phosphatidylinositol-3,4,5-trispho- 
sphate (PtdIns(3,4,5)P3), all containing the same acyl chains (18:1- 
18:1). Plotting intensity of peaks corresponding to lipid-bound states 
in the mass spectrum, relative to those of the apo protein, showed that 
PtdIns(4,5)P, had a higher affinity than PtdIns(4)P for B, AR (Fig. 1b). 
In the case of PtdIns(3,4,5)P3, which contains one more phosphate 
group than PtdIns(4,5)P2, binding to 8,AR was reduced to a similar 
level as observed for PI. This demonstrates that binding is selective for 
the head group of PtdIns(4,5)P>. We performed similar experiments for 
NTSRI1 and A2aR, and in both cases (PtdIns(4,5)P2) was found to bind 
with the highest affinity (Extended Data Fig. 3), implying that all three 
class A GPCRs contain preferential binding sites for PtdIns(4,5)P2. 

We performed coarse-grained molecular dynamics (CGMD) simu- 
lations (Extended Data Fig. 4) to characterize the molecular nature of 
GPCR-PtdIns(4,5)P. interactions in a phospholipid bilayer environ- 
ment”. PtdIns(4,5)P, molecules bound at the interface formed by the 
cytoplasmic loops linking transmembrane helix (TM)1, TM2, TM4 and 
TM7 of NTSRI1; this binding was mediated via interactions between 
the phosphorylated inositol head group and basic protein side chains 
(Fig. 1c and Extended Data Fig. 4a). Simulation of NTSR1-PS interac- 
tions indicated that these were lower-intensity, diffuse interactions that 
did not compete with PtdIns(4,5)P2 (Extended Data Fig. 4c). Similar 
interactions were seen with 3, AR, which also exhibited the capacity to 
interact with PtdIns(4,5)P> via the positively charged intracellular sur- 
faces of TM5, TM6 and TM7 (Extended Data Fig. 4b). A more extensive 
comparison of simulations for nine class A GPCRs (Extended Data 
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Fig. 1 | Identification of endogenous lipids, preferential binding 
of PI(4,5)P2, molecular dynamics simulation and site-directed 
mutagenesis define intracellular PtdIns(4,5)P2-binding hotspots. 
a, Mass spectrum of 3, AR (agonist free, green; charge state is shown) 
and 3,AR adducts (red, orange). Peaks (highlighted yellow) are 
selected in the quadrupole and analysed by tandem mass spectrometry. 
Phosphatidylserine (PS) and PtdIns(4)P (PIP) were identified in the 
resulting mass spectra. Binding curves plotted against lipid concentration 
confirm preferential binding of PtdIns(4)P over PS. b, Mass spectra 
of 8, AR following incubation with an equimolar solution containing 
PI, PtdIns(4)P, PtdIns(4,5)P and PtdIns(3,4,5)P3. Binding curves 
confirm favourable binding of PtdIns(4,5)P. c, CGMD simulation for 


Fig. 4d) showed that this pattern of interactions with PtdIns(4,5)P» at 
the intracellular ends of transmembrane helices is conserved, suggesting 
that it is structurally and/or functionally significant. 

To locate preferential binding sites for PtdIns(4,5)P2, we performed 
site-directed mutagenesis on NTSRI, mutating residues that we iden- 
tified as forming contacts with PtdIns(4,5)P» (Fig. 1d) to residues that 
retain the expression and folded state of the receptor”. We developed a 
mass-spectrometry-based strategy to analyse the effect of these muta- 
tions on PtdIns(4,5)P. binding (Extended Data Fig. 5a). Mutating 
selected Lys or Arg residues to residues of lower mass decreased the 
molecular weight of the receptor in comparison to the unmodified 
parental receptor. When incubated with PtdIns(4,5)P2, an equimo- 
lar solution of mutant and unmodified receptor is presented with an 
identical lipid environment and can be resolved by mass spectrom- 
etry. Attenuation of PtdIns(4,5)P. binding was observed in TM1 
(35 + 0.03%) and TM4 (70 + 0.13%) (Fig. 1d and Extended Data 
Fig. 5b), implying that the cytoplasmic surfaces of these receptors 
contain hotspots for PtdIns(4,5)P. binding. 

On the basis of the location of these sites on the cytoplasmic 
surface, we hypothesized that PtdIns(4,5)P2 binding influences 
downstream G-protein coupling. To investigate this, we developed a 
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TM1 TM4 TM7-H8 


NTSR1(TM86V-AIC3B) embedded in a lipid bilayer containing mixed PC 
and PtdIns(4,5)P2. Green spheres represent basic residues with high levels 
of interaction with lipids; purple surfaces represent regions with high 
density of occupation by PtdIns(4,5)P, (0.6-nm distance cut-off based on 
the radial distribution of coarse-grained particles). d, Left, highlighted 
residues are mutated in NTSR1(TM86V-AIC3B): TM1 (R43G, K44G and 
K45G; red), TM4 (R135I, R137T, K139L and K140L; orange) and TM7-H8 
(R311N; green). Right, inhibition of PtdIns(4,5)P> binding. Data are 

mean + s.d. from three independent experiments. Results indicate that 
mutations on the TM4 interface have a greater effect than those on the 
TM1 and TM7-H68 interfaces. Binding curves in a and b are plotted as 
mean + s.d. of three replicates from one experiment. 


mass-spectrometry-based assay in which the pentameric complex of 
AzaR (Aza4R-mini-Ga,34-Nb35; Nb35 is a stabilizing nanobody)!!? 
was preserved in vacuum. The heteropentamer separated into several 
subcomplexes following collision-induced dissociation, and PS and 
PI were observed to be directly bound to Az,R at higher abundance 
than they were before G-protein coupling (Fig. 2a and Extended Data 
Fig. 3d). We reasoned that in receptor-Ga( complexes, these lipids 
may have a stabilizing role, thereby, in turn, increasing signalling. To 
investigate these effects, we measured the GTPase activity of Gaby 
when coupled to active NTSR1 (bound to neurotensing_;3) in the 
presence or absence of PtdIns(4,5)P2. We found that GTP hydrolysis 
was enhanced to 1.3-fold in the presence of PtdIns(4,5)P2. Therefore, 
PtdIns(4,5)P2 enhances both G-protein coupling and GTPase activity 
(Fig. 2b). 

Because of the instability of the trimeric G-protein complex, it is 
not possible to explore the effects of lipids on coupling in an unbi- 
ased way. We therefore investigated receptor complexes formed with 
engineered mini-G subunits that recapitulate the increase in agonist 
affinity observed upon coupling with the native heterotrimeric G pro- 
tein (Fig. 2c). We recorded mass spectra of thermostabilized 8, AR in 
complex with mini-G,. We found increased association of lipids when 
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Fig. 2 | Selectivity of G-protein coupling and the presence of 
endogenous lipids on coupled receptors. a, A representative mass 
spectrum of A2,4R receptor coupled to a trimeric G-protein complex 

in complex with the stabilizing nanobody Nb35 (top left) from three 
independent experiments. Isolating and subjecting charge state 26+ 
(orange peak) to collision-induced dissociation results in dissociation into 
subcomplexes (bottom) and the receptor with lipid adducts (top right). 

b, GTPase assays indicate an increase of GTP hydrolysis by active NTSR1 
coupled to trimeric Ga; in the presence of PtdIns(4,5)P2. ***P < 0.001; 


8, AR was in a complex with mini-G, (Fig. 2d). The stability of the 
receptor-mini-G, complex allowed us to investigate the selectivity 
towards different subtypes of Ga subunits (Gs, Gayo and Gay/13). We 
investigated the coupling of agonist-bound 3,AR to mini-Gi.), which 
was engineered from mini-G, by introducing nine mutations on the a5 
helix to the corresponding residues on Ga;. We performed a similar 
experiment with the analogous mutant of Gaz, in which we transferred 
the mutations from mini-G, to Gj”, In comparison to mini-G,, there 
was a reduced degree of coupling with mini-G;.) and virtually no cou- 
pling with mini-G), (Fig. 2d). 

To investigate the effect of PtdIns(4,5)P2 on GPCR-mini-G, interac- 
tions, we incubated agonist-bound 8, AR with mini-G, in the presence 
of lipid and compared the mass spectrometry peaks corresponding to 
the lipid-bound protein. Although the complex can form in the absence 
of lipids, or with only one bound PtdIns(4,5)P2, complex formation is 
markedly enhanced (2.7- or 4.5-fold compared to the receptor with- 
out lipid, respectively) in the presence of two or three PtdIns(4,5)P> 
molecules (Fig. 3a, g). We observed a similar effect in a time-course 
experiment in which coupling of mini-G, to 8, AR increased by 21 +6% 
when two PtdIns(4,5)P2 molecules were bound and by a further 
12 +5% when three PtdIns(4,5)P2 molecules were bound (Extended 
Data Fig. 6a). 

We examined the effect of PS, an anionic lipid that was endoge- 
nously bound to 8, AR (Fig. 1a), on coupling of mini-G,. We per- 
formed analogous experiments using a threefold higher concentration 
of PS than that used in the experiments with PtdIns(4,5)P> to reflect 
the reduced affinity of 8, AR for PS (Fig. 3b and Extended Data Fig. 2). 
Mass spectra showed only a slight increase in the extent of mini-G, 
coupling as a function of PS binding. This reduced effect in compari- 
son to PtdIns(4,5)P» suggests that the electrostatic interactions of the 
polyanionic lipid headgroups in PtdIns(4,5)P2, which have multiple 
basic sidechains, are necessary for receptor coupling (as observed for 
Kir channels, for example!? ), and that such interactions do not occur 
with PS. 

These data indicate that additional PtdIns(4,5)P», but not PS, stabilize 
the complex once receptor coupling has occurred. Therefore, we used 
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Student’s t-test comparing the effect of PtdIns(4,5)P2 (one variable) on 
receptor-induced GTPase activation. Bars show mean +s.d., points show 
data from three independent experiments. c, Schematic representation 
of the influence of lipids and agonists on the binding of mini-G proteins. 
d, Mass spectra of isoprenaline-bound 8; AR with three different mini-G 
subunits (mini-G,, mini-Gj.) and min,-G}2). Enhanced coupling and 
lipid adducts are observed in the presence of G, (top right), In bottom 
right, bars show mean + s.d., points show data from three independent 
experiments. 


potential of mean force (PMF) calculations" to explore the effect of 
PtdIns(4,5)P2 binding on the free-energy landscape of Az4R-mini-G, 
interactions'®. Comparison of PMFs for PtdIns(4,5)P2-bound versus 
PS-bound receptor in a lipid bilayer indicates that the interaction of 
mini-G, with A»aR is stabilized significantly (50 + 10 kJ mol~') in the 
presence of PtdIns(4,5)P2 compared with PS (Fig. 3c and Extended 
Data Fig. 6b). The presence of PtdIns(4,5)P2 at the interface between the 
receptor and mini-G, in the PMF calculation implies that PtdIns(4,5)P. 
molecules form bridging interactions to stabilize the complex. 

The increase in PtdIns(4,5)P, binding to 8, AR when it is coupled to 
mini-G, could bea result of either (i) active conformations of receptors 
binding more PtdIns(4,5)P, than their inactive counterparts, or 
(ii) positively charged residues in mini-G,, at the receptor-G protein 
interface, recruiting additional PtdIns(4,5)P2 molecules following cou- 
pling. To investigate the dependence of PtdIns(4,5)P2 binding on receptor 
conformation, we incubated PtdIns(4,5)P2 with B,AR (co-purified 
with the agonist isoprenaline) containing an E130W mutation to sta- 
bilize ligand-free 8, AR without affecting G-protein coupling!®. We 
observed a 31 + 1% increase in PtdIns(4,5)P2 binding to the 8; AR- 
isoprenaline complex versus ligand-free 8, AR (Extended Data Fig. 6c). 
Whereas in general, transition to active states is thought to involve 
substantial movements of TM5 and TM6, intracellular loop (ICL)2 
was also found to undergo significant changes during activation of the 
«-opioid receptor’. These results are consistent with PtdIns(4,5)P> sta- 
bilizing active states of receptors via binding hotspots directly on ICL2, 
and, more generaly, via diffuse intracellular PtdIns(4,5)P2-binding sites. 

To explore the second possibility, in which additional PtdIns(4,5) 
P,-binding sites form following coupling, we carried out CGMD 
simulations for Az,R-mini-G,, which is, to our knowledge, the only 
available structure of a receptor-mini-G complex. In addition to the 
contacts described above, PtdIns(4,5)P» interacted with residues of 
mini-G, proximal to the lipid contacts in TM3, TM4 and TM5 of Az4R 
(Fig. 3e). To investigate the significance of these additional binding sites 
we used a nanobody (Nb6B9)!%, in which the lipid-binding residues 
identified in mini-G, are absent’? (Extended Data Fig. 7). Structures 
of receptors bound to Nb6B9 or to mini-G, are virtually identical'® 
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Fig. 3 | The effect of PtdIns(4,5)P2 on coupling to mini-G,, and 
comparison with PS, Nb6B9 and mini-G;. a, Representative mass spectra 
of 8, AR and 8, AR-mini-G, (n =3 independent experiments) in the 
presence of PtdIns(4,5)P, and the agonist isoprenaline. Coloured peaks 
highlight 8; AR lipid-bound states (top) and 6; AR-mini-G, lipid-bound 
states (bottom). b, Representative mass spectra of 8; AR and 8, AR-mini- 
G, (n = 3 independent experiments) in the presence of PS and the agonist 
isoprenaline. There is no marked difference in PS binding between 

8, AR and B, AR-mini-G,. c, Snapshots of steered molecular dynamics 
simulations to separate mini-G, and A2qR in the presence of PtdIns(4,5) 
P2 (green) and PS (pink). Orange outlines highlight the different binding 
modes of PtdIns(4,5)P. and PS to the receptor. The interaction of mini- 
G, with AzaR is stabilized by ~50 kJ mol! in the presence of PtdIns(4,5) 
P, relative to PS (Extended Data Fig. 6b). d, Representative mass spectra 


(root mean square displacement (r.m.s.d.) = 0.4-0.6 A). Comparing 
PtdIns(4,5)P2 binding to the receptor and to the receptor-nanobody 
complex, we found that the degree of PtdIns(4,5)P, binding was very 
similar (Fig. 3d, g). The absence of lipid-binding residues in Nb6B9 
(Fig. 3e) explains the insensitivity of the receptor-nanobody complex 
to PtdIns(4,5)P2 and implies that PtdIns(4,5)P2 molecules enhance 
coupling via interactions that are specific to the receptor and mini-G,. 
Lipids such as PS, in which the polyanionic headgroups are absent, 
would not be able to induce this effect. 

To investigate the possibility that residues specific to mini-G,, 
that are not present in other G proteins, mediate bridging, we inves- 
tigated the effects of PtdIns(4,5)P on the coupling of mini-G,,) to 
agonist-bound 8, AR. We found that coupling was increased in the pres- 
ence of PtdIns(4,5)P2, but toa lesser extent than with mini-G, (Fig. 3f, g). 
Given the established role in coupling to receptors of TM5 in Gag 
(R380), together with residues identified by molecular dynamics sim- 
ulation (Fig. 3e), and the fact that these residues are substituted in Ga; 
(E40, V41, K42, D216 and T380), differences in PtdIns(4,5)P2 binding 
can be attributed to disruption of these PtdIns(4,5)P2-bridging sites. 
It therefore follows that PtdIns(4,5)P2-binding sites on Gas, which 
are not present on Ga,, enable simultaneous binding of the 8, AR to 
the G protein to which it has highest affinity. Consequently, we pro- 
pose that PtdIns(4,5)P, acts as an allosteric modulator, binding to the 
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following incubation of 8,;AR with PtdIns(4,5)P. and isoprenaline in the 
absence or presence of Nb6B9 (Nb6B9:receptor, 0.3; n = 3 independent 
experiments). e, PtdIns(4,5)P2 contacts on Ay,R-mini-G, are shown on 
the receptor (purple) and mini-G , (Thr40, His41, Arg42, Lys216 and 
Arg380; green), and juxtaposed to basic residues on the 33AR-Nb80 
complex (Nb80, purple). f, Representative mass spectra following 
incubation of 3,AR with PtdIns(4,5)P, and isoprenaline in the absence 
or presence of mini-Gj,) (1 =3 independent experiments) No difference 
was detected between peaks in the presence or absence of PtdIns(4,5)P2. 
g, Normalized intensity of different lipid-bound states of the apo state of 
isolated receptor or receptor complexes. *P < 0.05; one-way ANOVA with 
Dunnett’s multiple comparison test. Bars show mean + s.d., points show 
data from three independent experiments. 


intracellular side of the receptor, stabilizing the active state and enhanc- 
ing selectivity of G-protein coupling. This coupling is then further sta- 
bilized by PtdIns(4,5)P2 molecules bridging between the receptor and 
the G protein. 

More generally, it has been established that the cytoplasmic face of 
GPCRs undergoes conserved conformational changes to allow cou- 
pling of G proteins”; the cytoplasmic ends of TM5 and TM6 move 
outwards, and TM7 moves slightly inwards. Synthetic molecules that 
bind at the TM5-TM6-TM7 cytoplasmic interface act as negative 
allosteric modulators that inhibit the activation of GPCRs by prevent- 
ing their movement and consequently reducing the affinity of agonists 
at the orthosteric binding pocket”!*. Here we highlight another role 
of the cytoplasmic interface, which recruits PtdIns(4,5)P, thereby sta- 
bilizing the active G-protein-bound state of the receptor. Simultaneous 
binding of the PtdIns(4,5)P, head group to both the Ga subunit and 
conserved TM4 residues on a number of class A receptors that are not 
present on class B receptors suggests the generality of this mechanism 
for selectively stabilizing active states of class A GPCRs (Extended Data 
Figs. 4d, 8). 

As the local concentration of PtdIns(4,5)P, in the membrane has 
the potential to be modulated by different signalling pathways, such 
as receptor tyrosine kinases or Ca”* signalling, crosstalk with GPCRs 
through PtdIns(4,5)P. may represent an additional mode of regulation 
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in the cell’*. Further, the potential to stabilize the active conforma- 
tion of G-protein-coupled receptors through the binding of potent 
small molecules that mimic the bridging effects of the PtdIns(4,5)P2 
head group provides a further avenue for stabilizing active states of 
GPCRs for therapeutic purposes. As PtdIns(4,5)P2 is able to discrim- 
inate between different G-protein subunits, and is likely to also influ- 
ence binding to B-arrestin, there are potential benefits in developing 
novel compounds that bind specifically to different G-protein-coupled 
or B-arrestin-bound states, thereby providing a new perspective for 
rational design of novel biased allosteric agonists. 
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METHODS 
Constructs and proteins. We used expression plasmids for two stabilized variants 
of rat NTSR1’4, NTSR1(HTGH4-AIC3B) contains the protein sequence from 
amino acids 50 to 390 with deletion of ICL3 (residues 273-290) and 26 thermosta- 
bilizing point mutations. It should be noted that this construct is only 80% identical 
to the wild-type. NTSR1(HTGH4 43-421) contains the intact protein sequence 
from residues 43 to 421, with the same stabilizing mutations as NTSR1(HTGH4- 
AIC3B). Purified thermostabilized turkey (Meleagris gallopavo) B,AR, human 
wild-type AzaR, engineered Ga, (mini-Gs) and nanobody Nb6B9 were used for 
mass spectrometry analysis!!>°, The following point mutations on 8, AR were 
used throughout: R68S, M90V, F327A, F338M (thermostabilizing); C116L (to 
increase protein expression); R284K (residue equivalent to 82AR designed to 
improve Nb80 binding); C358A (prevention of potential palmitoylation). In order 
to purify receptor in the unliganded state, a construct with the same thermosta- 
bilizing mutations but slightly different lengths of TM1 was introduced with an 
additional mutation (E130W) to stabilize the receptor. The use of an N-terminal 
TrxA fusion (C32S and C35S) on the receptor was necessary to confirm formation 
of a complex on SDS gels. Insect cell lines for receptor overexpression (Sf9 and 
Tni) were obtained from Invitrogen and Sf9 cells for heterotrimeric G protein 
production were provided by M. Hillenbrand. All cells were confirmed to be free 
from mycoplasma contamination. 
Protein expression and purification. Expression and purification of 3,AR M. gal- 
lopavo B,AR constructs (3118 and 8114-E130W) were based on the previously 
published thermostabilized 8;AR44-m23 construct?’ but contained only four 
(R68S, M90V, F327A, F338M) of the original six thermostabilizing mutations, as 
the two mutations on TM5 and TM6 (Y227A and A282L) were not included. The 
omission of these two mutations resulted in constructs that demonstrated coupling 
to G proteins and to G protein mimetic nanobody Nb80 along with high affinity 
agonist binding”®. The constructs included E. coli Thioredoxin fused to the N 
terminus of TM1 and the mutations C116L to improve expression and C358A to 
prevent potential palmitoylation. Both constructs were expressed in Sf9 insect cells 
using recombinant baculoviruses prepared using the transfer vector pAcGP67B 
(BD Biosciences) and BacPAK6 linearized baculovirus DNA (Oxford Expression 
Technologies). The membrane containing the expressed receptor was solubilized 
and purified in 2% and 0.02% dodecylmaltoside (DDM, Generon), respectively, 
as described previously””-”’. For 3118, the final purification step was competitive 
elution from an alprenolol sepharose ligand-affinity column in 20mM Tris-HCl, 
ph7.4, 350 mM NaCl and 0.02% DDM supplemented with 1mM isoprenaline, so 
that the receptor was prepared with bound agonist ligand. The purified receptor 
was finally concentrated to 15 mg/ml in the alprenolol sepharose elution buffer. 
8114(E130W) contained the mutation E130W, which increased functional 
expression of 8, AR'°. This mutation facilitated the preparation of highly purified 
active receptor without any bound ligand, as the use of a ligand-affinity chro- 
matography step was not necessary to separate non-functional receptor. For 
8114(E130W), purification was performed in 0.02% DDM by Ni?* affinity chro- 
matography followed by a thrombin (Sigma) protease cleavage step to remove the 
His tag before further purification by size-exclusion chromatography (SEC) on a 
Superdex Increase 200 10/300GL column (GE Healthcare) in 20mM Tris-HCl, 
ph7.4, 100 mM NaCl and 0.02% DDM, with final concentration to 45 mg/ml. 
Expression and purification of Az,R. The human A2aR construct (residues 1-308) 
was modified with a C-terminal histidine tag (His10) preceded by a TEV protease 
cleavage site, and by the mutation N154<A to prevent N-linked glycosylation. The 
A2aR was expressed in Tni insect cells using the baculovirus system. Cell mem- 
branes were prepared and solubilised with 2% lauryl maltose neopentyl glycol 
(LMNG, Anatrace) and the receptor was purified by Ni?* affinity chromatography 
and SEC, using a Superdex Increase 200 10/300GL column (GE) run in 20 mM 
HEPES pH 7.5, 100 mM NaCl, 10% (v/v) glycerol, 0.01% (w/v) LMNG and concen- 
trated to 10 mg/ml. Purification was as described previously!!, with the exception 
that the receptor was purified without addition of ligand. 
Expression and purification of mini-G,, mini-G; and mini-Gj». The engineered 
minimal G proteins, mini-G, construct R414”°, mini-G; construct and mini-Gy 
construct 8” were expressed in E. coli and purified by Ni?* affinity chromatography, 
followed by cleavage of the histidine tag using TEV protease and negative purifica- 
tion on Ni?+-NTA to remove TEV and undigested mini-G protein, and finally SEC 
to remove aggregated protein as described elsewhere”>””, with final concentration 
up to 100 mg/ml in 10 mM HEPES, pH 7.5, 100 mM NaCl, 10% v/v glycerol, 1 mM 
MgCh, 1 s.M GDP and 0.1 mM TCEP. 
Expression and purification of nanobody Nb6B9. A synthetic gene (Integrated 
DNA Technologies) for Nb6B9!?! was cloned into the plasmid pET-26b(+) 
(Novagen) with a N-terminal Hiss tag followed by a thrombin protease cleavage 
site. Expression was in E. coli strain BL21(DE3)RIL (Agilent Technologies) and 
purification from the periplasmic fraction was by Ni** affinity chromatography, 
but with the use of a thrombin (Sigma) protease cleavage step to remove the His 
tag before concentration to 40 mg/ml. 


Preparation of receptor-G-protein complexes. Several receptor-G-protein complexes 
were prepared for mass spectrometry analysis. A,,~R-mini-G,8y was prepared by 
incubating and co-purifying A24R, containing a TrxA fusion at the N-terminal, 
with N-ethyl-carboxamidoadenosine (NECA). The complex with trimeric G pro- 
teins complex consisted of mini-G,, G8, Gy and nanobody Nb35 with receptor:G 
proteins:Nb35 at a 1:2:4 molar ratio to stabilize the complex. The complex was fur- 
ther purified by gel-filtration chromatography after overnight incubation. 8; AR- 
miniG was prepared by incubating 8; AR co-purified with isoprenaline and the 
different mini-G proteins (mini-G,, mini-Gj.) and mini-Gj,) at 1:1.2 molar ratio. 
The incubation time was varied to capture the equilibrium of complex formation. 
Purification of heterotrimeric G protein. Baculovirus encoding the desired subunits 
(ai1817y1) was used to express the heterotrimeric G protein in Sf9 cells as previously 
described*. Cells from a 1-1 expression culture were resuspended and lysed in 
lysis buffer (10 mM HEPES pH 7, 20 mM KCl, 10 mM MgCh, 10 1M GDP, 2 mM 
8-mercaptoethanol, and cOmplete protease inhibitor (Roche)). The membranes 
were pelleted by ultracentrifugation at 108,000g for 35 min and solubilized in solu- 
bilisation buffer (50 mM HEPES pH 7, 150 mM NaCl, 10 mM MgCh, 10 1M GDP, 
2 mM 6-mercaptoethanol, 1% decyl-$-p-maltopyranoside (DM) (w/v), 10% (v/v) 
glycerol, and cOmplete protease inhibitor (Roche)) for 3 h. The supernatant was 
collected after centrifugation at 108,000g for 35 min and incubated with 1.2 ml 
TALON beads (GE Healthcare) overnight. The beads were collected and washed 
with ten column volumes wash buffer (30 mM HEPES pH 7, 300 mM NaCl, 
10 mM MgCh, 25 mM imidazole pH 8, 10 1M GDP, 2 mM 6-mercaptoethanol, 
10% (v/v) glycerol, and 0.5% (w/v) DM), followed by another twenty column- 
volume wash of wash buffer containing 40 mM imidazole (pH 8.0), and were eluted 
with five column volumes elution buffer (30 mM HEPES pH 7, 150 mM NaCl, 
1mM MgCh, 300 mM imidazole pH 8, 10 1M GDP, 2 mM 8-mercaptoethanol, 
10% (v/)v) glycerol, and 0.5% (w/v) DM). The protein was further purified by a 
Superdex 200 Increase PC 3.2/300 column (GE Healthcare) and the protein tag was 
removed by incubation with human rhinovirus 3C protease (produced in house) 
overnight. Following buffer exchange to storage buffer (20 mM HEPES pH 7, 
100 mM NaCl, 0.1 mM MgCh, 4 mM 6-mercaptoethanol, 10% (v/)v) glycerol, and 
0.5% (w/v) DM) and reverse immoblized metal affinity chromatography (IMAC) 
by Ni-NTA superflow beads (GE Healthcare), G-protein complex was concentrated 
to at least 2 mg/ml for experimental use. 

NTSRI1 expression BL21 E. coli cells were transformed with the expression plasmid 
encoding NTSR1(HTGH4-AIC3B) and grown overnight at 37°C in 20 ml 2YT 
medium supplemented with 1% (w/v) glucose and 100 ig/ml ampicillin. Two 
flasks, each containing each 11 2YT medium, 0.5% (w/v) glucose, and 100 j1g/ml 
ampicillin were inoculated with 10 ml pre-culture and grown to an Agoo nm of 0.5 
with shaking at 37°C. Receptor expression was induced with 1 mM isopropyl- 
8-p-thiogalactopyranoside (IPTG) and cells were cultivated at 28°C overnight. 
Cells were harvested after overnight expression and E. coli cell pellets were resus- 
pended in 100 ml solubilisation buffer, containing 100 mM HEPES pH 8.0, 20% 
(v/v) glycerol and 400 mM NaCl. Resuspended cells were frozen in liquid nitrogen 
and stored at —80°C. 

Apo NTSR1 purification. The cell pellet was thawed at room temperature. All 
following steps were carried out at 4°C. MgCl, (5 mM), 2 mg DNase I, 200 mg 
lysozyme, and 20 ml detergent mixture (0.2% (w/v) cholesteryl hemisuccinate 
Tris salt (CHS) and 2% (w/v) dodecyl-8-p-maltopyranoside (DDM)) were added 
to the thawed cell pellet. The mixture was incubated for 1 h, followed by cell lysis 
via mild sonication for 30 min in an ice-water bath. After cell lysis, 0.4 ml 5 M 
imidazole was added and the mixture was incubated for another 30 min. The 
suspension was centrifuged for 30 min at 28,000g. The supernatant was mixed 
with 5 ml TALON resin (Clontech), which had been pre-equilibrated with IMAC 
binding buffer (25 mM HEPES pH 8.0, 10% (v/v) glycerol, 600 mM NaCl, 0.1% 
(w/v) DDM and 20 mM imidazole) and incubated overnight on a rolling device. 
The mixture was loaded into a PD10 column (GE Healthcare) and was washed 
with 50 ml IMAC binding buffer. Elution of bound protein was performed with 
15 ml IMAC elution buffer containing 25 mM Hepes pH 8.0, 10% (v/v) glycerol, 
150 mM NaCl, 0.1% (w/v) DDM and 250 mM imidazole. Eluted receptor 
was concentrated in an Amicon-15 Ultra concentrator with a 100 kDa cut-off 
(Millipore) to a final volume of less than 2.5 ml. Concentrated receptor sample was 
loaded on a Sephadex G-25 desalting column (GE Healthcare), pre-equilibrated 
with 25 mM Hepes pH 8.0, 10% (v/v) glycerol, 150 mM NaCl, 0.1% (w/v) 
DDM to remove remaining imidazole. Desalted receptor was incubated with 
300 jl 1.6 mg/ml HRV 3C protease for 1 h at 4°C, followed by addition of 150 jl 
10% (w/v) LMNG and incubation for 1 h at 4°C. The cleaved protein was diluted 
threefold with reverse IMAC buffer (10 mM HEPES pH 8.0, 10% (w/v) glycerol, 
150 mM NaCl, and 0.01% (w/v) LMNG) and was loaded onto a PD10 column 
containing 5 ml Ni-NTA beads pre-equilibrated with reverse IMAC buffer. The 
flow through was collected in an Amicon-15 Ultra concentrator with a 50-kDa 
cut-off and the resin was further washed with 10 ml buffer. Receptor was con- 
centrated to a final volume of less than 1 ml and was subjected to preparative 
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SEC using a Superdex 200 10/300 GL column (GE Healthcare), which had been 
pre-equilibrated with 10 mM HEPES pH 8, 150 mM NaCl, and 0.01% (w/v) 
LMNG. Peak fractions corresponding to NTSR1(HTGH4-AIC3B) were pooled 
(final volume 3-4 ml) and concentrated in an Amicon-4 Ultra-concentrator with 
a 50-kDa cut-off to a final protein concentration of approximately 50 1M. Purified 
and concentrated NTSR1-H4 was mixed with 10 mM HEPES pH 8, 150 mM NaCl, 
0.01% (w/v) LMNG, and 50% (v/v) glycerol to yield a final glycerol concentration 
of 25%. Aliquots were frozen in liquid nitrogen and stored at —80°C for later use. 
Preparation of phospholipids and titration experiment. Phospholipids were 
purchased from Avanti (Avanti Polar Lipids) and prepared as 3 mM stock solu- 
tions in 200 mM ammonium acetate buffer pH 7.5 containing the detergent-mixed 
micelle preparation, containing DDM and foscholine as previously described*’. 
Phosphate analysis was performed to determine the concentration of phospholip- 
ids in solution™, For the titration experiment, 5 1M buffer-exchanged receptors 
in 200 mM ammonium acetate buffer pH 7.5 containing the detergent mixtures 
(DDM, LMNG, and foscholine for NTSR1; DDM and foscholine for 8; AR and 
AzaR) were mixed with lipids at various concentration points followed by equili- 
bration at 4°C for 5 min, by which time lipid binding had stabilized according to 
our time course measurements. Following mass spectrometry analysis, UniDec 
(Universal Deconvolution) software was used to quantify the relative abundance 
of each lipid-bound state*, and statistical analysis was performed using GraphPad 
Prism, assuming a one-site total binding model. 

Lipidomics analysis. Co-purified lipids from recombinant GPCRs were extracted 
by chloroform-methanol (2:1, v/v) and lyophilized and re-dissolved in 60% 
acetonitrile (ACN). For LC-MS/MS analysis, the extracted lipids were separated 
ona C18 column (Acclaim PepMap 100, C18, 75 mm x 15 cm; Thermo Scientific) 
using a Dionex UltiMate 3000 RSLC nano LC System. The buffers and gradient 
are adapted from a previous protocol”. In brief, the lipids were separated using a 
binary buffer system at 40°C using a gradient of 32-99% buffer B at a flow rate of 
300 nl/min over 30 min. (Buffer A: (acetonitrile: H,O (60:40), 10 mM ammonium 
formate, 0.1% formic acid) and buffer B (propan-2-ol:acetonitrile (90:10), 10 mM 
ammonium formate, 0,1% formic acid)). The column eluent was delivered via a 
dynamic nanospray source to a hybrid LTQ Orbitrap mass spectrometer (Thermo 
Scientific). Typical mass spectrometry conditions were: spray voltage (1.8 kV) and 
capillary temperature (175°C). The LTQ-Orbitrap XL was operated in negative 
ion mode using data-dependent acquisition with one MS scan followed by five 
MS/MS scans”. Survey full-scan mass spectra were acquired in the orbitrap (m/z 
350-2,000) with a resolution of 60,000. CID fragmentation in the linear ion trap 
was performed for the five most intense ions at an automatic gain control target of 
30,000 and a normalized collision energy of 38% at an activation of q=0.25 and 
an activation time of 30 ms. 

GTPase assay. The GTPase activity of trimeric Gay was measured with the 
GTPase-Glo assay (Promega). The assay was performed in white 384-well plates 
(Corning) using purified trimeric G proteins diluted into a GTPase buffer 
(10 mM HEPES pH 7, 50 mM NaCl, 0.05 mM MgCh, 2 mM B-mercaptoethanol, 
1mM DTT, 5% (v/v) glycerol, and 0.25% (w/v) DM) at a finial concentration 
2.5 {1M in the presence of 5 |tM GTP. The luminescent signal was measured after 
incubation at room temperature (1 h) following the manufacturer’s protocol to 
indicate the level of residual GTP. To analyse the impact of PtdIns(4,5)P2 we used 
NTSR1(HTGH4-AIC3B) co-purified with recombinant neurotensing_;3 following 
the method described previously**. The receptor was pre-incubated with deter- 
gent-solubilised PtdIns(4,5)P at 1:3 molar ratio (receptor:lipid) in the protein 
buffer (10 mM HEPES pH 8, 150 mM NaCl, 0.01% (w/v) LMNG) containing 
100 nM neurotensing.13 for 15 min on ice. The activated receptor was then added to 
the reaction mixture containing trimeric G proteins under the condition described 
above. 

Native mass spectrometry of GPCRs. Purified GPCRs were buffer exchanged 
into 200 mM ammonium acetate buffer pH 7.5 containing the mixed micelle 
preparation optimized for GPCR analysis as described previously®. The concen- 
tration of DDM, foscholine and CHS required to form a mixed micelle range from 
0.006-0.02%, 0-0.002%, and 0.001-0.01%, respectively, and are optimized for each 
receptor preparation. The samples were immediately introduced into a modified 
Q-Exactive mass spectrometer (Thermo), as described previously? . Ions were 
transferred into the higher-energy collisional dissociation (HCD) cell following 
a gentle voltage gradient (injection flatapole, inter-flatapole lens, bent flatapole, 
transfer multipole: 7.9, 6.94, 5.9, 4 V, respectively). An optimized acceleration volt- 
age (100-130 V) was then applied to the HCD cell to remove the detergent micelle 
from the protein ions. Backing pressure was maintained at ~1.00 x 10-° mbar and 
data was analysed using Xcalibur 2.2 SP1.48. 

The bound-lipid identification experiments were performed with a modified 
Synapt G2 mass spectrometer (Waters) equipped with a Z-spray source**’. The 
typical instrumental setting was source pressure (4.5-5.0 mbar), capillary voltage 
(1.2-1.5 kV) and cone voltage (100-200 V). An extraction voltage of 1-5 V was 
applied and 80-150 V was used as the collision voltage with argon as the collision gas 
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at a pressure of 0.2-0.3 MPa. To strip the detergent from protein ions in the source 
region, instrument values were optimized to capillary voltage (1.5 KV), cone voltage 
(200 V) and extraction voltage (3 V). A collision voltage ramp (from 20-100 V) 
was applied to dissociate protein-lipid complexes after quadrupole selection. 
Identification of preferential PtdIns(4,5)P2-binding sites on NTSRI1. 
Unmodified NTSR1 and NTSRI1 variants were pre-incubated at 1:1 molar ratio to 
produce a total protein concentration of 12 mM in protein buffer (10 mM HEPES 
pH 8, 150 mM NaCl, 0.01% (w/v) LMNG and 25% (v/v) glycerol). Detergent sol- 
ubilised PI(4,5)P was then added to the protein mixture at a final molar ratio of 
1.25:1 lipid:receptor. The reaction mixture was incubated at 4°C for 5 min and 
analysed by mass spectrometry after buffer exchanging to 200 mM ammonium 
acetate buffer containing the mix of detergents of DDM, LMNG and foscholine 
as described previously’. 

The ratio of PtdIns(4,5)P2 binding to the receptor was calculated by normal- 
izing the intensity of the receptor in PtdIns(4,5)P2-bound states to the unbound 
state using UniDec software. The results were evaluated by comparing the ratio of 
PtdIns(4,5)P2 binding between mutants and the unmodified receptor and plotted 
as a bar chart using GraphPad Prism. 

Mini-Gs and Nb6B9 coupling to BAR. Effector coupling to 8; AR was analysed 
using a modified Q-Exactive mass spectrometer after incubating purified 8,AR 
with mini-Gs-Nb6B9 at 1:1.2 molar ratio at 4°C in protein buffer (20mM Tris- 
HCl, ph7.4, 350 mM NaCl and 0.02% DDM). The relative percentage of effector 
coupling was quantified by UniDec software. A time course was performed with 
aliquots sampled after 2, 10, 30, and 60 min to monitor the formation of the 
mini-Gs-receptor complex. To investigate the effect of PtdIns(4,5)P2 on coupling, 
8, AR was pre-incubated with detergent-solubilised PtdIns(4,5)P2 at 1:1 molar ratio 
for 5 min at 4°C to equilibrate before mixing with mini-Gg or Nb6B9 at 1.2 or 
0.3 molar ratio to receptor, respectively. For the analogous PS binding experi- 
ment we pre-incubated 8, AR with a threefold higher concentration of detergent 
solubilised PS than PtdIns(4,5)P2 (PS:3,AR, 3:1 molar ratio) for 5 min at 4 °C to 
equilibrate before mixing with mini-Gs. 

Modelling and simulation system setup. Simulations were performed using the 
GROMACS v.4.6.3 simulation package. Initial protein coordinates were obtained 
using PDB ID 4BUO (NTSR1) and PDB ID 2Y03 (8);AR), with missing atoms 
added using MODELLER“. In the case of 8; AR, a model was also constructed 
in which S68 in the thermostabilized structure 2Y03 was back-mutated to R68 to 
reconstruct available basic residues in the wild-type receptor using the mutagenesis 
tool implemented in PyMOL v.1.3r1. Side-chain ionisation states were modelled 
using pdb2gmx*". The N and C termini were treated with neutral charge. Each 
protein structure was then energy minimized using the steepest descents algo- 
rithm implemented in GROMACS, before being converted to a coarse-grained 
representation using the MARTINI 2.2 force field”. The energy minimized 
coarse-grained structure was centred in a periodic simulation box with dimensions 
11 x 11 x 12 nm’. POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) 
molecules were randomly placed around the protein and the system was solvated 
and neutralised to a concentration of 0.15 M NaCl. An initial 50 ns of coarse- 
grained simulation was applied to permit the self-assembly of a POPC lipid bilayer 
around the GPCR. POPC lipids were randomly exchanged’ to create a mixed- 
species bilayer of specified composition (Extended Data Table 2). A cut-off distance 
of 2.5 nm was applied, with only molecules outside this distance being subject to 
exchange. The exchange protocol was conducted independently for each repeat 
simulation, such that different random initial configurations of lipids around the 
protein were generated for each simulation repeat. A summary of simulations 
performed is provided in Extended Data Table 2. 

Simulation details. The MARTINI force field’? was used to describe all system 
components. An ELNEDYN network“ was applied to the protein using a force 
constant of 500 kJ/mol/nm? and a cut off of 1.5 nm. Simulations were performed 
as an NPT ensemble, with temperature maintained at 310 K using a Berendsen 
thermostat* using a coupling constant of 7;=4 ps, and semi-isotropic pressure 
controlled at 1 bar using a Berendsen barostat* with a coupling constant of tr) =4 ps 
and a compressibility of 5 x 10~° bar’. Electrostatics were modelled using the 
reaction field coulomb type“, and smoothly shifted between 0 and 1.2 nm. Van der 
Waals interactions were treated using a shifting function between 0.9 and 1.2 nm. 
Covalent bonds were constrained to their equilibrium values using the LINCS 
algorithm“. Equations of motion were integrated using the leap-frog algorithm, 
with a 20-fs time step. All simulations were run in the presence of conventional 
MARTINI water, and neutralised to a concentration of 0.15 M NaCl. 

Analysis of simulation data was conducted using VMD“, PyMOL, tools imple- 
mented in GROMACS"|, and in-house protocols. Protein-lipid contact analysis 
employed a cut-off distance of 0.6 nm, based on radial distribution functions for 
coarse-grained lipid molecules’. 

A2aR-mini-Gg PMF calculations. PMFs for the interaction of mini-Gs with A2,R in 
a lipid bilayer in the presence and absence of PtdIns(4,5)P2 were calculated using the 
MARTINI force field®°. To obtain a PtdIns(4,5)P2-bound A2,4R-mini-Gg complex, 
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we first ran ten coarse-grained molecular dynamics simulations on receptor 
embedded in an asymmetric complex membrane, each lasting 8 j1s (Extended 
Data Table 2). The r.m.s.d. to the crystal structure of Az,R-mini-Gs complex 
(PDB ID 5G53) was calculated for the protein in these ten simulations, and the 
protein complex with the lowest r.m.s.d. was saved together with the membrane 
bilayer. The coarse grained mini-Gs was then docked back to the membrane- 
embedded receptor based on the A24R-mini-Gg crystal structure to generate the 
starting configuration of a steered molecular dynamics (SMD) simulation. In the 
SMD, the mini-Gs was pulled away from the receptor along the z axis (normal 
to the membrane plane) at a rate of 0.05 nm/ns using a force constant of 
1000 kJ/mol/nm? while the receptor was restrained in place using a harmonic 
force of 1000 kJ/mol/nm?. The distance between the centre of mass of the receptor 
and the mini-Gg was defined as the 1D reaction coordinate and the pulling process 
covered a distance of 3 nm. The initial configurations of the umbrella sampling 
were extracted from the SMD trajectory spacing 0.05 nm apart along the reaction 
coordinate. Fifty umbrella sampling windows were generated, and each was sub- 
jected to 1-\1s molecular dynamics simulation, in which a harmonic restrain of 
1000 kJ/mol/nm? was imposed on the distance between the centre of mass of the 
receptor and the mini-Gs to maintain the separation of the two. The PMF was 
extracted from the umbrella sampling using the weighted histogram analysis 
method (WHAM) provided by the GROMACS g_wham tool*!. A Bayesian boot- 
strap was used to estimate the statistical error of the energy profile. The PMF 
of the binding process in the absence of PtdIns(4,5)P2 was calculated following 
the same protocol, with the only change made to the lipid composition of the 
membrane lower leaflet. PtdIns(4,5)P2 was taken out from the membrane and 
instead the concentrations of POPC, 1,2-dioleoyl-sn-glycero-3-phosphocholine 
(DOPC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine (POPE) and 
1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE) were increased by 2.5% 
to make up for the vacancy left by the absence of PtdIns(4,5)P>. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. All relevant data are available from corresponding authors on 
request. 
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Extended Data Fig. 1 | Identification of lipids bound to 
NTSR1(HTGH4-AIC3B). a, Endogenous lipids bound to 
NTSR1(HTGH4-AIC3B), isolated from E. coli, are identified as PA 
following m/z selection in the mass spectrometry quadrupole of the 
NTSRI:lipid 11+ charge state (highlighted yellow) and collisional 
activation to dissociate PA and its homologues (m/z, 700-760 Da). 

b, Lipidomics analysis of purified NTSR1 with three technical replicates 
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reveals peaks at low m/z. MS/MS spectra of the precursor ion (M-H-1) at 
m/z 699.32 highlighted yellow, leads to definitive fragment ions at m/z 281 
and 417 consistent with the structure of PA (36:2). c, Analogous lipidomics 
analysis of purified 8,;AR from insect cells with three technical replicates. 
MS/MS spectra of the two [M-H-1] precursor ions (m/z 758.50 and 
786.53) identified the lipids as PS (34:2) and PS (36:2) respectively with 
diagnostic fragments indicated. 
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Extended Data Fig. 2 | Lipid-binding preference of NTSR1 and 

8, AR. a-e, The binding of NTSR1(HTGH4-AIC3B), measured by mass 
spectrometry (n= 3 independent experiments), to the phospholipids 
PA (a), PS (b), PI (c), PC (d) and DAG (e). The measurements were 
performed at different lipid concentrations (0 to 160 j1M) and the 
percentages of individual lipid-binding peaks (sum of apo protein and all 
lipid adducts obtained in the region of the mass spectrum under study) 
were plotted against lipid concentrations in solution. The lipid-binding 
curves were deduced from fitting to one-site total binding. Values of s.d. 
were calculated from three independent replicate experiments at each 
concentration. The results show that NTSR1 interacts preferentially 
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with anionic phospholipids (PA and PS), as no binding was observed for 
neutral (DAG) and zwitterionic (PC) lipids. f, g, Exogenous POPS (f) 

and PtdIns(4)P (g) were added to 8, AR at different final concentrations 
(10 1M is shown here). Spectra were recorded for a range of lipid 
concentrations from 0 to 80 ,.M for PS and 0 to 20 {tM for PtdIns(4)P. Peak 
intensities of the individual PtdIns(4)P-bound species were measured 

and plotted against lipid concentration to yield a relative affinity for one 
PtdIns(4)P binding (1), two PtdIns(4)P molecules binding (2 x) or three 
PtdIns(4)P molecules binding (3 x); only the first PtdIns(4)P molecule 
binds with high affinity (see Fig. 1a). Data are mean + s.d. from three 
independent experiments. 
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Extended Data Fig. 3 | Investigation of the phospholipid preferences 


of A>4R and NTSR1. a, A representative mass spectrum of purified 


AzaR from three independent experiments revealed truncations of the 


N-terminal sequence (MPIM). The arrows between species indicate the 
mass differences corresponding to truncated amino acids (M, PI and M). 
b, A competitive binding assay (n = 3 independent experiments) in which 


AzaR was incubated with a mixture of lipids (PI, PtdIns(4)P, PI(4,5)P2, and 


PtdIns(3,4,5)P3) before mass spectrometry, indicated that PtdIns(4,5)P. 
binds with a higher affinity than the other phospholipids to A24R. ¢, The 


analogous competitive binding assay, in which NTSRI1 was incubated with 
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a mixture of lipids (PI, PtdIns(4)P, PI(4,5)P2 and PtdIns(3,4,5)P3) before 
mass spectrometry. Ratio to apo is plotted as a function of concentration 
and defined as the ratio of the intensity corresponding to individual 

PI phosphate adducts to the receptor in the apo state (inset). The same 
data analysis methods are used for Fig. 1b. PtdIns(4,5)P2 binds with a 
higher affinity than the other phospholipids to A24R. Data are shown as 
mean + s.d. from three independent replicates. d, A representative mass 
spectrum of A,,R (n= 3 independent experiments) used for preparation 
of the G-protein complex reveals lower abundance of PS and PI adducts 
prior to coupling to G proteins. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | NTSR1-PtdIns(4,5)P, and 8, AR-PtdIns(4,5)P> 
interactions within CGMD simulations, and comparison of PtdIns(4,5) 
P, contacts among different GPCRs. a, Volumetric density surfaces 
showing the average spatial occupancy of PtdIns(4,5)P» lipids around 

a crystal structure of NTSR1(TM86V-AIC3B) (PDB: 4BUO), which 
shares a greater sequence identity to the wild-type receptor (91%) than 
NTSR1(HTGH4-AIC3B) (86%), contoured to show the major PtdIns(4,5) 
P,-interaction sites. Density surfaces were calculated over 5 j1s of CGMD 
(blue surface, n = 10 independent experiments), and 100 jus of CGMD 
(magenta, n = 1 experiment). The cytoplasmic side of NTSR1 structure 

is coloured from white (low PtdIns(4,5)P, interaction) to red (high 
PtdIns(4,5)P. interaction). Extending a simulation to 100 1s revealed no 
overall change in the patterns of PtdIns(4,5)P» interaction. Less specific, 
and hence more dynamic, interaction was seen for the acyl chain moieties 
of PtdIns(4,5)P2, which yielded more diffuse probability densities. 

b, 8; AR-PtdIns(4,5)P, interactions within CGMD simulations. Contact 
patterns are shown for simulations containing 5% PtdIns(4,5)P> in the 
lipid bilayer and thermostable 3; AR (PDB: 2Y03, top), 10% PtdIns(4,5)P2 
and thermostable 3, AR (middle), and 10% PtdIns(4,5)P2 and 3, AR(S68R) 
construct (bottom). In each case PtdIns(4,5)P, contacts were calculated 
over 5 jus of CGMD (n= 10 independent experiments; error bars, s.d.), 
with each repeat simulation initiated from different random system 


configurations. c, PS and PtdIns(4,5)P, contacts with NTSR1 as a function 
of residue position, for PC:PS membranes (top left), PC:PS:PtdIns(4,5) 

P, membranes (top right), PC:PtdIns(4,5)P, membrane (bottom left) and 
PC:PS:PtdIns(4,5)P2 (bottom right). The position of helices is denoted by 
horizontal grey bars. Lipid contact is calculated as the mean number of 
contacts between each residue and a given lipid species per frame, using a 
6 A distance cut-off. n = 3; error bars, s.d.. d, PtdIns(4,5)P> contacts seen 
in CGMD simulations for nine class A GPCRs: histamine H1 receptor, 
PDB 3RZE; 8; adrenergic receptor, 2VT4; 3 adrenergic receptor, 2RH1; 
CB1 cannabinoid receptor, 5TGZ; M4 muscarinic acetylcholine receptor, 
5DSG; adenosine Az, receptor, 3EML; dopamine D3 receptor, 3PBL; 
sphingosine 1-phosphate receptor, 3V2W; rhodopsin, 1F88. GPCR 
sequences are shown, with TM helices, intracellular loops (ICL) and 

H8 helices indicated by horizontal bars, and with amino acids coloured 
according to the mean number of contacts per simulation frame with the 
PtdIns(4,5)P2 molecules. Green boxes correspond to the high frequency of 
PtdIns(4,5)P» interactions discussed in the main text for the TM1, TM4, 
and TM7/H8 motifs of NTSR1. Contacts were computed over 1 jus CGMD 
simulations (n = 3 independent experiments) for each GPCR, using a6 A 
cut-off. Sequences were aligned using T-Coffee** and mapping of protein— 
lipid contact data onto the sequence alignment used ALINE. 
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Extended Data Fig. 5 | Site-directed mutagenesis attenuates PtdIns(4,5) 
P, binding to NSTR1. a, Schematic representation of the experimental 
protocol designed to combine mass spectrometry with mutagenesis to 
produce mutants of lower molecular mass than wild type, which, when 
incubated with PtdIns(4,5)Po, yield a direct readout of the effect of 
mutations in specific regions. b, PtdIns(4,5)P2 binding of NTSR1 mutants 
on residues that exhibit the highest frequency of PtdIns(4,5)P» interaction 
in molecular dynamics simulation. Mutation of NTSR1(HTGH4-AIC3B) 
residues on TM1 (R46G, K47G and K48G (R43G, K44G and K45G 

in NTSR1(TM86-AIC3B); R91G, K92G, K93G in wild type)), TM4 
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(R138I, R140T, K142L and K143L (R135I, R137T, K139L and K140L in 
NTSR1(TM86-AIC3B); R131, R185T, K187L and K188L in wild type)) 
and TM7-H8 (R316N (R311N in NTSR1(TM86-AIC3B); R377N in wild 
type)) attenuate PtdIns(4,5)P, binding, and indicate that the TM4 interface 
is a preferential binding site over TM1 and TM7-H8 interfaces. Selection 
of residues for mutations was guided by molecular dynamics (Extended 
Data Fig. 4) and previous studies in which binding of a fluorescently 
labelled agonist, BODIPY neurotensin, to NTSR1, was screened and used 
to monitor efficient production, insertion, and folding”. 
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Extended Data Fig. 6 | PtdIns(4,5)P» binds preferentially to 8, AR in 
an active state and stabilizes 8,;AR coupled to mini-G, and A24R-mini- 
G, complex. a, A time-course experiment was performed to monitor 

the formation of active 8; AR-mini-G, complex. The coupling efficiency 
(percentage) was calculated from the relative intensity of peaks assigned 
to 8, AR-mini-G, coupling in the appropriate lipid-bound state. The plot 
indicates that mini-G, coupling is enhanced by PtdIns(4,5)P2 when more 
than two lipid molecules are bound to the receptor. Error bars represent 
s.d. from at least three independent experiments. b, Plot of PMF for the 
interaction of mini-G, with A2R in the presence of PtdIns(4,5)P> (green) 
or PS (grey). The PMF is calculated along a reaction coordinate (Az) 
corresponding to the centre-centre separation of the mini-G, and receptor 
proteins along the z axis (normal to the bilayer plane). The interaction 

of mini-G, with the Az R is stabilized in the presence of PtdIns(4,5)P2 

by 50 +10 kJ mol"! relative to PS. Error bars (which are <10 kJ mol!) 
are from bootstrap sampling of the PMFs and therefore represent 

the ‘statistical’ errors in estimating the well depth from a given set of 
simulations and PMF calculation (n = 3 independent experiments). We 
therefore estimate a minimum error of <10 kJ mol~!. c, Mass spectra were 
recorded for a 1:1 equimolar mix of an inactive unliganded 8, AR variant, 
E130W, and its unmodified active counterpart (co-purified with the 
agonist isoprenaline) in the presence of PI(4,5)P». Lipid binding occurred 
on both receptors, but following normalization to account for differences 
in ionization efficiency, a clear preference for PtdIns(4,5)P2 binding to the 
active receptor was observed. Bars represent mean + s.d. 
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Extended Data Fig. 7 | Detection of nanobody coupling to 8, AR. 
Peaks in the mass spectrum assigned to Nb6B9 binding to 6, AR to form 
an equimolar 8,;AR-Nb6B9 complex are highlighted in orange, and 
demonstrate complete complex formation, implying that nanobody has a 
higher affinity than mini-G, for B; AR. n=3 independent experiments. 
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Extended Data Fig. 8 | Structural comparison of class A and class B Basic residues on the interface adjacent to the cytoplasmic end of TM4 
GPCRs in complex with trimeric Gay complexes. The PtdIns(4,5)P2 are highlighted as purple spheres. Lower panels show an expanded view, 
contacts of the Ga, subunit observed in molecular dynamics simulations highlighting the conserved pattern of PtdIns(4,5)P. bridging in class A 
(green spheres) are highlighted on the structures of trimeric G-protein GPCRs (8)AR and AoaR (Fig. 3e)), both of which have basic residues on 
interactions with BAR (PDB: 3SN6), the glucagon-like peptide-1 receptor | TM4 (Lys140 and Arg107/111) that are not present in the class BGPCRs 
(GLP-1) (PDB: 5VAI) and the calcitonin receptor (CTR) (PDB: 5UZ7). GLP-1R and CTR. 
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Extended Data Table 1 | Lipidomics analysis of purified 8:AR 
Lipid ID 
PC (36:0) 
PE (34:1) 
PE (36:2) 
PE (34:2) 
PE (38:1) 
PE (37:4) 
PG (38:6) 
PI (32:1) 
PI (34:2) 
PI (36:0) 
PI (34:1) 
PI (36:1) 
PI (36:2) 
PI (38:1) 
PS (34:2) 
PS (34:1) 
PS (36:2) 
PIP 423 
CL (648 
CL (664 
CL (64 
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Extended Data Table 2 | Simulations run 


Name Length 
NTSRI1 10x 5 us 
NTSRI1 3x5 us 
NTSRI 10x 5 us 
NTSRI extended 1 x 100 us 
Bi AR (5%) 10x 5 us 
Bi: AR (10%) 10x35 ps 
Bi AR (S68R, 10%) 10x 5 us 
AzaR-mini-Gs 10x 8 us 
Lipids were symmetrically distributed between leaflets. 


Bilayer Composition 
POPC(95%):PIP2(5%) 
POPC(95%):PS(5%) 
POPC(95%):PS(5%):PIP2(5%) 
POPC(95%):PIP2(5%) 
POPC(95%):PIP2(5%) 
POPC(90%):PIP2(10%) 
POPC(90%):PIP2(10%) 


POPC(95%):PIP2(5%) 
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Retired palaeontologist Michael Wuttke takes lignite samples with a drill stick, near Darmstadt, Germany. 


Stick retirement! 


Scientists who step back from full-time work can find plenty 
of ways to remain active in their research field. 


BY AMBER DANCE 


ouis Chen was technically meant to 
retire in 2005. The mathematician at the 
National University of Singapore was 
turning 65, the university's official retirement 
age. But he was only five years into his tenure 
as director of the university’s new Institute for 
Mathematical Sciences, and the university 


wanted him to stay on. So he remained for 
seven more years, stepping down in 2012. Over 
the next 18 months, he travelled and had knee 
surgery, before returning in summer 2014 to 
teach graduate courses for a year. 

Then, in 2015, Chen’s provost took him 
to lunch. “He told me that maybe it was time 
for me to leave,” says Chen, who was happy to 
retire. But he still hasn't really left: he’s at his 
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university office three or four times a week. 
“IT cannot abandon my research,” says Chen. 
“It’s a passion.” In July 2015, he was appointed 
emeritus professor, a title that comes with perks: 
he’s eligible to apply for grants, and continues 
his research on probability and statistics. He 
maintains his e-mail address and library access 
and, he’s delighted to say, “free parking for life”. 


GO YOUR OWN WAY 

There are as many ways to retire as there are 
scientists; there’s no right or wrong path. Many 
researchers wish to continue their academic 
lives in one way or another. The emeritus title 
can allow scientists to keep laboratory or office 
space or apply for grants; associated privileges 
vary widely. However, research funds probably 
wont flow as generously as they used to, and 
emeriti typically downsize their research space 
and teams. Some retired scientists turn to other 
projects, such as writing books or doing charity 
work. The key to a fulfilling retirement, say 
those who are pleased to have stepped down 
from full-time work, is to line up positions and 
projects, and to prepare for the emotional toll 
that the transition can take. 

Worldwide, the ranks of those aged 60 or 
older are expected to rise. The United Nations, 
for example, predicts that by 2050, 21% of the 
world’s population will be at least 60 years 
old, up from 10% in 2000. Among scientists, 
in particular, the average age is climbing. In 
the United States, scientists’ average age rose 
from 45 to 48.6 between 1993 and 2010, and 
it is expected to climb further. The trend is 
similar in Europe. 

National rules on retirement vary widely. In 
Sweden, for example, it is mandatory at age 67; 
in South Africa, at 65. The United States and 
Canada have no mandatory requirement age. 
Although data on active retirees and emeriti 
are scarce, a 2014 survey of retired medical 
professors from 20 countries found that many 
continued to teach, and that more than 40% 
had published at least one paper or book in the 
previous year (N. G. De Santo et al. QJM 107, 
405-407; 2014). 

Researchers who are nearing retirement 
should start preparing for it as early as possible, 
advises Amy Strage, assistant vice-president for 
faculty development at San Jose State Univer- 
sity in California. There could be many options 
to research and decisions to weigh. For exam- 
ple, at some universities where retirement is a 
choice, faculty members can take advantage 
of phased retirement plans. That means they 
can wind down their research while working 
part-time, so long as they commit to a date | 
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> for full retirement within a few years. Those 
required to step down from their positions at 
acertain age might be able to arrange unpaid 
positions, or jobs in countries with a higher 
retirement age. 

Some retired faculty members gain emeritus 
status, although the meaning of that title var- 
ies widely between institutions and nations. 
At some universities, it’s granted pro forma to 
retiring full professors. At others, it’s an honour 
bestowed only on pre-eminent researchers. “It 
is retirement with distinction,” says Kimberly 
Read, assistant director for the Florida Center 
for Inclusive Communities at the Univer- 
sity of South Florida (USF) in Tampa. Read 
researched retirement and emeritus issues, 
focusing on the oral history of an emeritus 
professor, for her 2016 PhD thesis at USE. 

Emeritus is the final rung on the academic 
trajectory from assistant professor to associate 
to full. Obtaining this ultimate promotion is 
often much like gaining those earlier ones, with 
a committee evaluating a person’s research or 
service contributions to the university, and 
administrators approving a decision to award 
the honour. 

For some, the emeritus title is a final feather 
in their academic cap as they head through 
the door. Others take it as a commitment to 
further engagement with the university. “You 
want to continue to help the department,” 
explains Dean Martin, an emeritus professor 
of chemistry at USF and Read's research sub- 
ject. Every morning, he comes to his office, 
where he does research and publishes papers, 


LASTING BONDS 
Keeping in touch 


Full-time researchers interact daily with 
colleagues and students, but retirees 
risk losing that sense of community. 
Organizations can help to restore it. Here 
are a few examples. 

@ The European Association of 
Professors Emeriti welcomes retired 
professors from all European 
universities, as well as corresponding 
members from abroad. From 68 
founding members in 2016, the 
organization has grown to nearly 200. 
@ The US-based Association of 
Retirement Organizations in Higher 
Education lists about 100 such 
communities, often associated with a 
particular university or other academic 
organization. 

@ The Emeritus College of Arizona State 
University (ASU), in Tempe, welcomes 
emeriti of ASU and associate members 
from elsewhere. Member activities 
include memoir classes and helping 
students to prepare for an international 
science competition. A.D. 
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Pharmacologist Edith Sim enjoys her retirement. 


mentors students, edits the departmental 
newsletter and raises funds for the department. 

Achieving emeritus status gives retiring 
professors a lasting connection with their 
university. They might or might not be given 
their own office, but they will typically have 
access to resources such as the gym, library 
and e-mail. They might also be given admit- 
tance to emeritus associations, which provide 
camaraderie (see ‘Lasting bonds’). They will 
not, however, receive a salary. 

Access to grant money varies from one 
country to another, but maintaining emeri- 
tus status and a university affiliation is often 
enough to make a retired researcher eligible, 
at least to apply. “Since I retired I’m busier than 
ever, writing papers, travelling to meetings and 
giving talks,’ says George Ellis, 79, an emeritus 
mathematician at the University of Cape Town 
in South Africa. “The main issue is funding” 

For several years, he held on to a small grant 
of 100,000 rand (about US$7,500) from the 
South African National Research Foundation. 
The amount allowed him to attend overseas 
conferences and invite researchers from other 
nations to visit and collaborate. But cuts at the 
foundation have caused the grants to dry up. 
Now, he plans to attend conferences only if the 
hosts pay for his trip. 

Fortunately for Ellis, his research requires 
little funding and few resources. “They can’t 
keep one from thinking and reading and writ- 
ing,’ he points out. With his emeritus status, he's 
able to keep an office in his department, and he 
continues to work with colleagues and students. 

Some researchers manage to keep a lab 
going. Martin has continued to win grants well 
into his retirement. But he is an outlier in terms 
of the amount of work he does in his retirement 
and, at times, this has generated confusion. His 
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grant funds are deposited in a research bank 
account — but more than once, the university 
assumed he was inactive and transferred those 
funds into central accounts. (His dean and 
department chair restored the money.) 

Many retired scientists, however, can’t 
maintain the lab space and funding that they 
did as active researchers. They might not 
want to continue competing for grants, and 
the space might be needed for new faculty 
members. That doesn’t mean that retired 
researchers can’t make academic contribu- 
tions, says Strage. They might shift from bench 
science to less-space-intensive activities such 
as giving speeches, guest teaching or reviewing 
manuscripts, she explains. 

Others find new lab space. Michael Wuttke, a 
vertebrate palaeontologist, engineered his own 
post-retirement research position. In 2015, aged 
65, he left his job at the General Directorate 
for Cultural Heritage Rhineland-Palatinate in 
Mainz, Germany. But he had started seriously 
considering his next steps a few years earlier. 
He set up a position as ‘designated volunteer at 
the Senckenberg Research Institute and Natural 
History Museum in Frankfurt, and since 2015 
he has been working with specimens from fos- 
sil sites such as the Messel Pit, a disused quarry 
near Frankfurt, where he did his PhD research. 
Wuttke has access to the same resources and 
scientific equipment that employees have. 
He expects to publish soon on a previously 
unknown species of frog, whose fossilized 
remains were discovered at the site. 


TEAM SPIRIT 

Scientists can also remain active in research by 
continuing to correspond with the team they 
once led. Stem-cell physician Outi Hovatta 
retired from the Karolinska Institute in 
Stockholm, at the age of 70. Although Sweden's 
retirement age is 67, she was able to stay on for 
three more years as a senior faculty member 
— provided she funded her own salary from 
grants. At 70, she was happy to return to her 
family home in Helsinki. 

She passed the research group on to a 
colleague, but stays active as a professor emer- 
ita. Now 72, Hovatta continues to correspond 
with her Karolinska colleagues. Including her 
name on grants helps them to obtain funding, 
and she comments on draft publications. 

Edith Sim, 67, a retired pharmacologist and 
emerita professor of the University of Oxford, 
UK, and Kingston University London, has also 
kept up her research without a lab. She's writ- 
ing papers based on unpublished data that she 
had collected earlier, and on fresh data from 
collaborators. She recently published a book, 
co-authored with a former student, on how 
certain enzymes affect a person’s response to 
drugs. 

Sim is also involved in charitable work. 
This includes running a Saturday morning 
programme to give teenagers a taste of what 
science is like before they commit to studying 
it further. For example, the teens in her pilot 
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programme at Kingston played with wind 
tunnels and microscopes and interviewed 
astronauts. Sim is also a trustee with the 
Daphne Jackson Trust, a UK charity that 
helps scientists who have had a career break 
to return to research. 

Of course, scientists don’t always find 
retirement easy. Nancy Schlossberg thought 
it would be “a piece of cake”. In 1997, at the 
age of 68, and after 24 years as a counselling 
psychologist at the University of Maryland 
in College Park, she became an emerita and 
headed for Sarasota, Florida. There, she 
hoped shed find ways to write or get speak- 
ing invitations. But it wasn’t that simple. “I 
get to Sarasota, and there Iam without a 
purpose,’ she recalls. “I was shocked” 

Schlossberg had to create her own 
opportunities. Because her professional 
expertise was in life transitions, she decided 
to study retirement. That work led to the 
first of three books that she has penned 
on the topic since ‘retiring’: Retire Smart, 
Retire Happy (American Psychological 
Association, 2003). 

Through her post-retirement research, 
Schlossberg has worked out why it didn’t 
feel good to be suddenly purposeless in 
sunny Florida. Those who have retired, she 
learned, must shape a new identity. “That 
transition process, even if it’s something you 

wanted, can be 


“Since I retired very unsettling,” 
I'm busier says Schlossberg. 
than ever, “But the most 
writing papers, —_ important thing 
travelling to tied to your iden- 
meetings and tity is your sense 


of purpose. That's 
what gives you the 
reason to get up in the morning” Of course, 
that sense of purpose or identity needn't 
be related to previous professional activi- 
ties — but Schlossberg’s interviews with 
retired researchers indicate that they need 
something to define their lives. 

“You do feel, a little bit, that you might 
be kind of sidelined,” says Chen, who has 
noticed that he receives fewer invitations 
to present at or to organize talks and con- 
ferences. But he’s not terribly bothered. “I 
think you have to accept this,” he says. 

And in any case, Chen notes, he now has 
time to lunch with old friends whenever they 
call, and to rekindle old hobbies: singing, 
and playing the recorder and cello. 

Sim also struggled a little at first. As a 
full-time scientist, shed found solace in 
her garden. Once she retired, gardening no 
longer offered the same sense of escape. It 
took time to rediscover the joy of tending 
the plants. Today, she says, retirement feels 
good. “Now that I’ve got used to it,” says 
Sim, “it’s a very nice way to live” m 


giving talks.” 


Amber Dance is a freelance writer in Los 
Angeles, California. 


COLUMN 


More than a meeting 


Convene a colloquium, says Francesco Sciortino. 


rganizing a scientific conference can 
() be a daunting prospect. You know 

that it could offer exceptional career 
benefits by boosting your network and help- 
ing you to develop those famous soft skills: 
communication, teamwork and time manage- 
ment. But you might think the process involves 
unacceptable levels of stress, complications to 
your unpredictable schedule and even more 
delays to that unfinished project. 

Still, you should consider the option. You'll 
refine skills that are not necessarily innate and 
that you'll need in any job. Why not hone them 
in the setting of an enthusiastic student group? 

I became involved in student activism during 
my high-school days in Italy, before I moved 
to the United Kingdom in 2010 to study phys- 
ics. As an undergraduate at Imperial College 
London, I joined student associations to 
meet like-minded people and to get a taste 
of a variety of research fields. I set up tours to 
my department's laboratories and found the 
gratitude of other students to be extremely 
rewarding. Through the Imperial College 
Physics Society, I also co-organized a number 
of trips, some of which later inspired me to pur- 
sue a PhD in plasma physics — none more so 
than our visits to the Culham Centre for Fusion 
Energy near Oxford in 2013, 2014 and 2015. 

A different chapter began in August 2014, 
when I and six others joined together to 
found the Italian Association of Physics 
Students (AISF). Since then, our group has 
grown to more than 1,000 members in Italy 
and has become one of the most active in the 
International Association of Physics Students 
(IAPS). We have organized public lectures, 
lab tours and outreach events, offering simple 
demonstrations to school groups of all ages 
and engaging in the International Year of Light 
celebrations in 2015, which aimed to highlight 
the importance of light and optical technolo- 
gies. Since then, the AISF has also set up annual 
visits to Italy’s Gran Sasso National Laboratory 
in Abruzzo, the European Gravitational 
Observatory near Pisa and other leading 
research facilities. The Italian Conference of 
Physics Students has become our key annual 
gathering, bringing together more than 
100 students from institutions nationwide in 
a different city each year. 

In 2015, one year after we founded the 
AISE, we submitted a bid to host the 32nd 
International Conference of Physics Students 
(ICPS). It sounded a little over-ambitious at first, 
but we demonstrated that our association could 
raise the necessary funding and institutional 
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support. It could hardly have gone better. 
In August 2017, the ICPS took place in the 
Italian city of Turin with 450 participants from 
44 countries, and included almost 200 talks and 
posters from university students of all levels. 
I was part of an outstanding team that helped to 
exhibit the best of Italian academic research, the 
wonders of our national cuisine and local artis- 
tic treasures. Our programme included trips to 
the Turin Astrophysical Observatory, Sacra di 
San Michele Abbey and traditional wine cellars. 

Organizing student events shapes how you 
collaborate with people. I discovered what 
kind of team player I am. I learnt that bal- 
anced group dynamics, rather than individual 
herculean efforts, best foster motivation, 
enthusiasm and effectiveness. I’ve always 
wanted my impact to exceed my direct reach, 
and so connecting with others who could carry 
my efforts forward was essential. Seeing other 
people independently repeat events that I initi- 
ated has been extremely rewarding. 

I started out with pragmatism, but little 
understanding of the art of compromise. That's 
now been forced into me by countless online 
meetings, most recently as part of acommit- 
tee to reform the regulations of the IAPS. The 
international setting of these efforts also gave 
me chances to travel, practise languages and 
gain exposure to fund-raising. I've developed 
important friendships and boosted the com- 
petitiveness of my PhD applications, which in 
turn brought me to the United States. 

Joining student associations, organizing 
events all over Europe and becoming part ofa 
community of enthusiastic young scientists has 
helped me to go beyond lecture halls, research 
labs and supervisor meetings. The skills that 
I gained have given me the freedom to enjoy 
much more of my own scientific career. = 


Francesco Sciortino is a PhD candidate in 
plasma physics at the Massachusetts Institute 
of Technology in Cambridge. 
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Ua SCIENCE FICTION 


BY KURT PANKAU 


Fires. Bodies. The long winter. 

Cold. I’m on a mattress on the floor, 
one arm draped over a woman I dont rec- 
ognize. It’s dark, but I can see wheel wells. 
So it’s not the floor — the back of a truck, 
maybe? A van? 

The sleeping woman in front of 
me is facing away, but her face looks 
young, maybe early forties. She’s 
wrapped in several layers of rag- 
ged clothing. And so am I, I realize. 
Ireach a hand up to feel my face and 
find a thick beard. I never wore a 
beard. I don’t have the slightest idea 
how to maintain one. 

I sit up and place a hand on the 
window. There’s a curtain, but 
through the thin cloth I can feel the 
freezing pane of glass. I pull aside the 
curtain and I can see a foot of snow 
reflected in the moonlight. The sky 
is dark purple. I grope around for the 
door. 

Where am I? 

“Albuquerque, I think,” says the 
woman. “Come back to bed, Papa 
Bear.’ I don't recognize her voice. 

“Tm just going to take a leak,” I say. 
I need to get out of here. 

“We emptied your bag three hours 
ago, she said. 

I reach a hand instinctively to my mid- 
dle and feel a twitch in my abdomen as my 
fingers nudge a plastic line attached to the 
stoma near my... How do I know what a 
stoma is? 

“Come back to bed.” 

“Who are you?” Task. 

“It’s me,’ she says. “Lisa.” 

“Lisa’s long dead,” I say. How do I know 
that? Dear God, Lisa. When did she die? 

“Dammit,” says the woman. 

“Why did you say that?” I ask, my anger 
rising. “Why would you pretend to be my 
wife?” 

“I wasn't pretending to be your wife.” The 
woman makes a snorting noise. “I never 
know what you're going to remember.” 

“Remember?” 

“During your spells.” 

“Spells?” I fumble for the door. 

“Please don't,’ she 


[z= flash in my head. Evacuations. 
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PAPA BEAR 


Cold comfort. 


engine,’ I say, moving to the front of the 
van. There's no key, but the panel under the 
steering wheel has been ripped off and wires 
dangle, exposed. The passenger’s seat is filled 
with gasoline cans and jugs of water. 

Scraps of paper with a tangle of notes in 
my handwriting. My fingers are trembling. 
Letters to myself. Pieces of a puzzle I can’t 
quite assemble. 


It’s like the past... 

“Like the past is running away from you?” 

I feel hands on my shoulder and her nose 
on the back of my neck. The touch is famil- 
iar. 

“How did you know —?” I start to ask. 

“You were thinking out loud,’ says the 
woman. “You do that.’ 

“Oh,” I say. 

“Tm cold, Papa Bear,’ she says. “Please 
come back to bed. [ll explain in the 
morning.” 

“Explain it to me now,’ I say, turning to 
face her. “Tell me everything” 

“You ll never get back to sleep,” says the 
woman. “There's too much. Come back to 
bed. I want you to sing to me” 

“Who are you?” Lask. 

“Tm your wife,’ she says. 

“Lisa’s dead,” I say. 

“So is my first husband. So are a lot of 
people.” She wraps her arms around my 
neck and presses her lips to mine. The hairs 
of my beard rub against my chin and tickle 
my face. The kiss is brief, but I know those 
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lips. She pulls back and rests her forehead 
against mine. 

“Tell me your name, at least,” I say. 

“Jennifer,” she says. Her voice is quavering. 

Oh no, I’ve done something wrong. 

“No, you haven't,” she says, shaking her 
head gently. 

“Tm just ... I’m having trouble,’ I say. My 
voice is hoarse and raspy. 

“I know, Papa Bear,’ she says, snif- 
fling. “I promise, I'll tell you every- 
thing in the morning. Just come back 
to bed” 

“Why are you crying?” Task. 

“Because a long time ago, the 
world ended,” she says. “The world 
ended and you saved us and I fell in 
love with you. And because nobody 
else gives a damn about you, even 
though you saved all our lives” — she 
swallows hard — “because you get 
confused sometimes.” 

I look out the window. There are 
a dozen other cars around us. Like a 
caravan. Beyond them, lean-tos and 
the skeletons of small structures. 
They’re building something. No, 
they’re rebuilding. Why? We need 
to stay mobile. Don’t we? 

“I know where this conversation 
is heading and I can't do it again, 
not tonight,” she says. “I just can’t. 
Please come to bed. I promise, every- 
thing will be better in the morning. 
Just come back to bed” Tears glisten in the 
moonlight. In this cold, they must sting her 
cheeks. 

I nod. She pulls me back to the mattress. 
She lies down and holds my arm tightly to 
her chest. Her face flashes through my mind. 
I can feel the weight of a thousand memories 
hidden behind a fog. 

“Tlove you, Papa Bear,’ she says. Her voice 
is almost a whimper. 

“I love you too, Baby Bear,” I say, more 
from instinct than memory. 

Her gentle sob is interrupted by a burst 
of tearful laughter. I guess ’'ve remembered 
something important. I pull her closer to me, 
to shut out the cold, the emptiness, the void 
in my head where the past should be. I close 
my eyes. 

Everything will be better in the morning. m 


Kurt Pankau lives with his family in 

St Louis, Missouri. He loves board games, 
dad jokes and stories about time travel. He 
tweets at @kurtpankau and occasionally 
blogs at kurtpankau.com. 
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